Socket
Socket
Sign inDemoInstall

@rgrove/parse-xml

Package Overview
Dependencies
0
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    @rgrove/parse-xml

A fast, safe, compliant XML parser for Node.js and browsers.


Version published
Maintainers
1
Install size
157 kB
Created

Changelog

Source

3.0.0 (2021-01-23)

This release includes significant changes under the hood (such as a brand new parser!), but backwards compatibility has been a high priority. Most users should be able to upgrade without needing to make any changes (or with only minimal changes).

Added

  • XML processing instructions are now included in parsed documents as XmlProcessingInstruction nodes (with the type value "pi"). Previously they were discarded.

  • A new sortAttributes option. When true, attributes will be sorted in alphabetical order in an element's attributes object (which is no longer the default behavior).

  • TypeScript type definitions. While parse-xml is still written in JavaScript, it now has TypeScript-friendly JSDoc comments throughout, with strict type checking enabled. These comments are now used to generate type definitions at build time.

Changed

  • The minimum supported Node.js version is now 12.x, and the minimum supported ECMAScript version is ES2017. Extremely old browsers (like IE11) are no longer supported out of the box, but you can still transpile parse-xml yourself if you need to support old browsers.

  • The XML parser has been completely rewritten with the primary goals of improving robustness and safety.

    While the previous parser was good, it relied heavily on complex regular expressions. This helped keep it extremely small, but also left it open to the possibility of regex denial of service bugs when parsing unusual or maliciously crafted input.

    The new parser uses a less interesting but overall safer approach, and employs regular expressions only sparingly and in ways that aren't risky (they're now only used as performance optimizations rather than as the basis for the entire parser).

  • The parseXml() function now returns an XmlDocument instance instead of a plain object. Its properties are backwards compatible.

  • Other node types (elements, text nodes, CDATA nodes, and comments) are also now represented by class instances (XmlElement, XmlText, XmlCdata, and XmlComment) rather than plain objects. Their properties are all backwards compatible.

  • Attributes are no longer sorted alphabetically by name in an element's attributes object by default. They're now defined in the same order that they're encountered in the document being parsed, unless the sortAttributes parser option is true.

  • If the value returned by an optional resolveUndefinedEntity function is not a string, null, or undefined, a TypeError will now be thrown. If you don't pass a custom resolveUndefinedEntity function to parseXml(), then this change won't affect you.

  • Some error messages have been changed to improve clarity, and more helpful errors have been added in some scenarios that previously would have resulted in generic or less helpful errors.

  • The browser field in package.json has been removed and the main field now points both Node.js and browser bundlers to the same untranspiled CommonJS source.

    When bundled using your favorite bundler, parse-xml will work great in all modern browsers with no transpilation needed. If you don't want to use a bundler, you can still use the prepackaged UMD bundle at dist/umd/parse-xml.min.js, which provides a parseXml global.

Readme

Source

parse-xml

A fast, safe, compliant XML parser for Node.js and browsers.

npm version Bundle size Test & Lint

Contents

Installation

npm install @rgrove/parse-xml

Or, if you like living dangerously, you can load the minified UMD bundle in a browser via Unpkg and use the parseXml global.

Features

Not Features

This parser currently discards document type declarations (<!DOCTYPE ... >) and all their contents, because they're rarely useful and some of their features aren't safe when the XML being parsed comes from an untrusted source.

In addition, the only supported character encoding is UTF-8 because it's not feasible (or useful) to suppport other character encodings in JavaScript.

API

See API.md for complete API docs.

Examples

Basic Usage

const parseXml = require('@rgrove/parse-xml');
let doc = parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

The result is an XmlDocument instance containing the parsed document, with a structure that looks like this (some properties and methods are excluded for clarity; see the API docs for details):

{
  type: 'document',
  children: [
    {
      type: 'element',
      name: 'kittens',
      attributes: {
        fuzzy: 'yes'
      },
      children: [
        {
          type: 'text',
          text: 'I like fuzzy kittens.'
        }
      ],
      parent: { ... },
      isRootNode: true
    }
  ]
}

Friendly Errors

When something goes wrong, parse-xml throws an error that tells you exactly what happened and shows you where the problem is so you can fix it.

parseXml('<foo><bar>baz</foo>');

Output

Error: Missing end tag for element bar (line 1, column 14)
  <foo><bar>baz</foo>
               ^

In addition to a helpful message, error objects have the following properties:

  • column Number

    Column where the error occurred (1-based).

  • excerpt String

    Excerpt from the input string that contains the problem.

  • line Number

    Line where the error occurred (1-based).

  • pos Number

    Character position where the error occurred relative to the beginning of the input (0-based).

Why another XML parser?

There are many XML parsers for Node, and some of them are good. However, most of them suffer from one or more of the following shortcomings:

  • Native dependencies.

  • Loose, non-standard parsing behavior that can lead to unexpected or even unsafe results when given input the author didn't anticipate.

  • Kitchen sink APIs that tightly couple a parser with DOM manipulation functions, a stringifier, or other tooling that isn't directly related to parsing and consuming XML.

  • Stream-based parsing. This is great in the rare case that you need to parse truly enormous documents, but can be a pain to work with when all you want is a node tree.

  • Poor error handling.

  • Too big or too Node-specific to work well in browsers.

parse-xml's goal is to be a small, fast, safe, compliant, non-streaming, non-validating, browser-friendly parser, because I think this is an under-served niche.

I think parse-xml demonstrates that it's not necessary to jettison the spec entirely or to write complex code in order to implement a small, fast XML parser.

Also, it was fun.

Benchmark

Here's how parse-xml stacks up against two comparable libraries, libxmljs2 (which is based on the native libxml library) and xmldoc (which is based on sax-js).

Node.js v14.15.4 / Darwin x64
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

Running "Small document (291 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 3.0.0:
    77 109 ops/s, ±0.46%   | fastest

  libxmljs2 0.26.6 (native):
    29 480 ops/s, ±4.62%   | slowest, 61.77% slower

  xmldoc 1.1.2 (sax-js):
    36 035 ops/s, ±0.62%   | 53.27% slower

Finished 3 cases!
  Fastest: @rgrove/parse-xml 3.0.0
  Slowest: libxmljs2 0.26.6 (native)

Running "Medium document (72081 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 3.0.0:
    321 ops/s, ±0.99%   | 54.34% slower

  libxmljs2 0.26.6 (native):
    703 ops/s, ±10.64%   | fastest

  xmldoc 1.1.2 (sax-js):
    235 ops/s, ±0.50%   | slowest, 66.57% slower

Finished 3 cases!
  Fastest: libxmljs2 0.26.6 (native)
  Slowest: xmldoc 1.1.2 (sax-js)

Running "Large document (1162464 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 3.0.0:
    20 ops/s, ±0.48%   | 72.97% slower

  libxmljs2 0.26.6 (native):
    74 ops/s, ±12.02%   | fastest

  xmldoc 1.1.2 (sax-js):
    19 ops/s, ±1.68%   | slowest, 74.32% slower

Finished 3 cases!
  Fastest: libxmljs2 0.26.6 (native)
  Slowest: xmldoc 1.1.2 (sax-js)

See the parse-xml-benchmark repo for instructions on running this benchmark yourself.

License

ISC License

Keywords

FAQs

Last updated on 23 Jan 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc