Security News
RubyGems.org Adds New Maintainer Role
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
The sax npm package is a streaming XML parser that is designed for speed and simplicity. It follows the SAX parsing approach, which is an event-based model for parsing XML documents. This allows developers to handle different parts of the document as they are parsed without keeping the entire document in memory.
Parsing XML
This code demonstrates how to parse an XML string. It creates a new SAX parser, sets up an event listener for the 'opentag' event to log the name and attributes of each tag, and then writes an XML string to the parser.
const sax = require('sax'),
parser = sax.parser(true);
parser.onopentag = function (node) {
// node has attributes with string values
console.log(node.name + ' - ' + JSON.stringify(node.attributes));
};
parser.write('<xml><tag attr="value">content</tag></xml>').close();
Stream Parsing
This code demonstrates how to parse XML from a file stream. It creates a SAX stream, sets up an event listener for the 'opentag' event, and then pipes a read stream from a file into the SAX stream.
const sax = require('sax'),
fs = require('fs'),
saxStream = sax.createStream(true);
saxStream.on('opentag', function (node) {
console.log(node.name + ' - ' + JSON.stringify(node.attributes));
});
fs.createReadStream('file.xml').pipe(saxStream);
Error Handling
This code demonstrates how to handle errors during parsing. It sets up an error event listener on the SAX parser to handle any parsing errors.
const sax = require('sax'),
parser = sax.parser(true);
parser.onerror = function (e) {
// an error happened.
};
parser.write('<xml>this is some malformed xml</xml>').close();
xml2js is a similar npm package that provides an XML to JavaScript object converter. Unlike sax, which is a streaming parser, xml2js reads the entire XML document and converts it into a JavaScript object. This can be more convenient for small documents but less efficient for large ones.
fast-xml-parser is another alternative that offers both a streaming mode and a non-streaming mode for XML parsing. It claims to be very fast and flexible, providing options to validate, parse, and traverse XML. It is a good alternative to sax when performance is a critical factor and when additional features like validation are needed.
libxmljs is a binding to the libxml C library, providing a more traditional DOM-based approach to parsing XML. It is different from sax in that it builds an in-memory representation of the entire document, which can then be queried and manipulated. This is more powerful but also more resource-intensive than the event-based approach of sax.
A sax-style parser for XML and HTML.
Designed with node in mind, but should work fine in the browser or other CommonJS implementations.
<!DOCTYPE
s and <!ENTITY
sThe parser will handle the basic XML entities in text nodes and attribute values:
& < > ' "
. It's possible to define additional entities in XML
by putting them in the DTD. This parser doesn't do anything with that. If you want
to listen to the ondoctype
event, and then fetch the doctypes, and read the entities
and add them to parser.ENTITIES
, then be my guest.
Unknown entities will fail in strict mode, and in loose mode, will pass through unmolested.
var sax = require("./lib/sax"),
strict = true, // set to false for html-mode
parser = sax.parser(strict);
parser.onerror = function (e) {
// an error happened.
};
parser.ontext = function (t) {
// got some text. t is the string of text.
};
parser.onopentag = function (node) {
// opened a tag. node has "name" and "attributes"
};
parser.onattribute = function (attr) {
// an attribute. attr has "name" and "value"
};
parser.onend = function () {
// parser stream is done, and ready to have more stuff written to it.
};
parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
Pass the following arguments to the parser function. All are optional.
strict
- Boolean. Whether or not to be a jerk. Default: false
.
opt
- Object bag of settings regarding string formatting. All default to false
.
Settings supported:
trim
- Boolean. Whether or not to trim text and comment nodes.normalize
- Boolean. If true, then turn any whitespace into a single space.lowercasetags
- Boolean. If true, then lowercase tags in loose mode, rather
than uppercasing them.write
- Write bytes onto the stream. You don't have to do this all at once. You
can keep writing as much as you want.
close
- Close the stream. Once closed, no more data may be written until it is
done processing the buffer, which is signaled by the end
event.
resume
- To gracefully handle errors, assign a listener to the error
event. Then,
when the error is taken care of, you can call resume
to continue parsing. Otherwise,
the parser will not continue while in an error state.
At all times, the parser object will have the following members:
line
, column
, position
- Indications of the position in the XML document where
the parser currently is looking.
closed
- Boolean indicating whether or not the parser can be written to. If it's
true
, then wait for the ready
event to write again.
strict
- Boolean indicating whether or not the parser is a jerk.
opt
- Any options passed into the constructor.
And a bunch of other stuff that you probably shouldn't touch.
All events emit with a single argument. To listen to an event, assign a function to
on<eventname>
. Functions get executed in the this-context of the parser object.
The list of supported events are also in the exported EVENTS
array.
error
- Indication that something bad happened. The error will be hanging out on
parser.error
, and must be deleted before parsing can continue. By listening to
this event, you can keep an eye on that kind of stuff. Note: this happens much
more in strict mode. Argument: instance of Error
.
text
- Text node. Argument: string of text.
doctype
- The <!DOCTYPE
declaration. Argument: doctype string.
processinginstruction
- Stuff like <?xml foo="blerg" ?>
. Argument: object with
name
and body
members. Attributes are not parsed, as processing instructions
have implementation dependent semantics.
sgmldeclaration
- Random SGML declarations. Stuff like <!ENTITY p>
would trigger
this kind of event. This is a weird thing to support, so it might go away at some
point. SAX isn't intended to be used to parse SGML, after all.
opentag
- An opening tag. Argument: object with name
and attributes
. In
non-strict mode, tag names are uppercased.
closetag
- A closing tag. In loose mode, tags are auto-closed if their parent
closes. In strict mode, well-formedness is enforced. Note that self-closing tags
will have closeTag
emitted immediately after openTag
. Argument: tag name.
attribute
- An attribute node. Argument: object with name
and value
.
comment
- A comment node. Argument: the string of the comment.
cdata
- A <![CDATA[
block. Since <![CDATA[
blocks can get quite large, this event
may fire multiple times for a single block, if it is broken up into multiple write()
s.
Argument: the string of random character data.
end
- Indication that the closed stream has ended.
ready
- Indication that the stream has reset, and is ready to be written to.
Build an HTML parser on top of this, which follows the same parsing rules as web browsers.
Make it fast by replacing the trampoline with a switch, and not buffering so much stuff.
FAQs
An evented streaming XML parser in JavaScript
The npm package sax receives a total of 20,740,037 weekly downloads. As such, sax popularity was classified as popular.
We found that sax demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.
Security News
Research
Socket's threat research team has detected five malicious npm packages targeting Roblox developers, deploying malware to steal credentials and personal data.