Socket
Socket
Sign inDemoInstall

sax

Package Overview
Dependencies
0
Maintainers
0
Versions
46
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    sax


Version published
Weekly downloads
33M
decreased by-0.36%
Maintainers
0
Install size
126 kB
Created
Weekly downloads
 

Package description

What is sax?

The sax npm package is a streaming XML parser that is designed for speed and simplicity. It follows the SAX parsing approach, which is an event-based model for parsing XML documents. This allows developers to handle different parts of the document as they are parsed without keeping the entire document in memory.

What are sax's main functionalities?

Parsing XML

This code demonstrates how to parse an XML string. It creates a new SAX parser, sets up an event listener for the 'opentag' event to log the name and attributes of each tag, and then writes an XML string to the parser.

const sax = require('sax'),
  parser = sax.parser(true);
parser.onopentag = function (node) {
  // node has attributes with string values
  console.log(node.name + ' - ' + JSON.stringify(node.attributes));
};
parser.write('<xml><tag attr="value">content</tag></xml>').close();

Stream Parsing

This code demonstrates how to parse XML from a file stream. It creates a SAX stream, sets up an event listener for the 'opentag' event, and then pipes a read stream from a file into the SAX stream.

const sax = require('sax'),
  fs = require('fs'),
  saxStream = sax.createStream(true);
saxStream.on('opentag', function (node) {
  console.log(node.name + ' - ' + JSON.stringify(node.attributes));
});
fs.createReadStream('file.xml').pipe(saxStream);

Error Handling

This code demonstrates how to handle errors during parsing. It sets up an error event listener on the SAX parser to handle any parsing errors.

const sax = require('sax'),
  parser = sax.parser(true);
parser.onerror = function (e) {
  // an error happened.
};
parser.write('<xml>this is some malformed xml</xml>').close();

Other packages similar to sax

Readme

Source

sax js

A sax-style parser for XML and HTML.

Designed with node in mind, but should work fine in the browser or other CommonJS implementations.

What This Is

  • A very simple tool to parse through an XML string.
  • A stepping stone to a streaming HTML parser.
  • A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML docs.

What This Is (probably) Not

  • An HTML Parser - That's the goal, but this isn't it. It's just XML for now.
  • A DOM Builder - You can use it to build an object model out of XML, but it doesn't do that out of the box.
  • XSLT - No DOM, no querying.
  • 100% Compliant with (some other SAX implementation) - Most SAX implementations are in Java and do a lot more than this does.
  • An XML Validator - It does a little validation when in strict mode, but not much.
  • A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic masochism.
  • A DTD-aware Thing - Fetching DTDs is a much bigger job.

Regarding <!DOCTYPEs and <!ENTITYs

The parser will handle the basic XML entities in text nodes and attribute values: &amp; &lt; &gt; &apos; &quot;. It's possible to define additional entities in XML by putting them in the DTD. This parser doesn't do anything with that. If you want to listen to the ondoctype event, and then fetch the doctypes, and read the entities and add them to parser.ENTITIES, then be my guest.

Unknown entities will fail in strict mode, and in loose mode, will pass through unmolested.

Usage

var sax = require("./lib/sax"),
  strict = true, // set to false for html-mode
  parser = sax.parser(strict);

parser.onerror = function (e) {
  // an error happened.
};
parser.ontext = function (t) {
  // got some text.  t is the string of text.
};
parser.onopentag = function (node) {
  // opened a tag.  node has "name" and "attributes"
};
parser.onattribute = function (attr) {
  // an attribute.  attr has "name" and "value"
};
parser.onend = function () {
  // parser stream is done, and ready to have more stuff written to it.
};

parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();

Arguments

Pass the following arguments to the parser function. All are optional.

strict - Boolean. Whether or not to be a jerk. Default: false.

opt - Object bag of settings regarding string formatting. All default to false. Settings supported:

  • trim - Boolean. Whether or not to trim text and comment nodes.
  • normalize - Boolean. If true, then turn any whitespace into a single space.
  • lowercasetags - Boolean. If true, then lowercase tags in loose mode, rather than uppercasing them.

Methods

write - Write bytes onto the stream. You don't have to do this all at once. You can keep writing as much as you want.

close - Close the stream. Once closed, no more data may be written until it is done processing the buffer, which is signaled by the end event.

resume - To gracefully handle errors, assign a listener to the error event. Then, when the error is taken care of, you can call resume to continue parsing. Otherwise, the parser will not continue while in an error state.

Members

At all times, the parser object will have the following members:

line, column, position - Indications of the position in the XML document where the parser currently is looking.

closed - Boolean indicating whether or not the parser can be written to. If it's true, then wait for the ready event to write again.

strict - Boolean indicating whether or not the parser is a jerk.

opt - Any options passed into the constructor.

And a bunch of other stuff that you probably shouldn't touch.

Events

All events emit with a single argument. To listen to an event, assign a function to on<eventname>. Functions get executed in the this-context of the parser object. The list of supported events are also in the exported EVENTS array.

error - Indication that something bad happened. The error will be hanging out on parser.error, and must be deleted before parsing can continue. By listening to this event, you can keep an eye on that kind of stuff. Note: this happens much more in strict mode. Argument: instance of Error.

text - Text node. Argument: string of text.

doctype - The <!DOCTYPE declaration. Argument: doctype string.

processinginstruction - Stuff like <?xml foo="blerg" ?>. Argument: object with name and body members. Attributes are not parsed, as processing instructions have implementation dependent semantics.

sgmldeclaration - Random SGML declarations. Stuff like <!ENTITY p> would trigger this kind of event. This is a weird thing to support, so it might go away at some point. SAX isn't intended to be used to parse SGML, after all.

opentag - An opening tag. Argument: object with name and attributes. In non-strict mode, tag names are uppercased.

closetag - A closing tag. In loose mode, tags are auto-closed if their parent closes. In strict mode, well-formedness is enforced. Note that self-closing tags will have closeTag emitted immediately after openTag. Argument: tag name.

attribute - An attribute node. Argument: object with name and value.

comment - A comment node. Argument: the string of the comment.

opencdata - The opening tag of a <![CDATA[ block.

cdata - The text of a <![CDATA[ block. Since <![CDATA[ blocks can get quite large, this event may fire multiple times for a single block, if it is broken up into multiple write()s. Argument: the string of random character data.

closecdata - The closing tag (]]>) of a <![CDATA[ block.

end - Indication that the closed stream has ended.

ready - Indication that the stream has reset, and is ready to be written to.

Todo

Build an HTML parser on top of this, which follows the same parsing rules as web browsers.

Make it fast by replacing the trampoline with a switch, and not buffering so much stuff.

FAQs

Last updated on 08 Jul 2011

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc