Added a new includeOffsets parser option. #25

When true, the starting and ending byte offsets of each node in the input string will be made available via start and end properties on the node. The default is false.

This option is useful if you want to preserve the original source text of each node when later serializing a document back to XML. Previously, the original source text was always discarded, which meant that if you parsed a document and then serialized it, the original source text would be lost.
```
const { parseXml } = require('@rgrove/parse-xml');

let xml = '<root><child /></root>';
let doc = parseXml(xml, { includeOffsets: true });

console.log(doc.root.toJSON());
// => { type: 'element', name: 'root', start: 0, end: 22, ... }

console.log(doc.root.children[0].toJSON());
// => { type: 'element', name: 'child', start: 6, end: 15, ... }
```
Added a new preserveXmlDeclaration parser option. #31

When true, an XmlDeclaration node representing the XML declaration (if there is one) will be included in the parsed document. When false, the XML declaration will be discarded. The default is false, which matches the behavior of previous versions.

This option is useful if you want to preserve the XML declaration when later serializing a document back to XML. Previously, the XML declaration was always discarded, which meant that if you parsed a document with an XML declaration and then serialized it, the original XML declaration would be lost.
```
const { parseXml } = require('@rgrove/parse-xml');

let xml = '<?xml version="1.0" encoding="UTF-8"?><root />';
let doc = parseXml(xml, { preserveXmlDeclaration: true });

console.log(doc.children[0].toJSON());
// => { type: 'xmldecl', version: '1.0', encoding: 'UTF-8' }
```
Added a new preserveDocumentType parser option. #32

When true, an XmlDocumentType node representing a document type declaration (if there is one) will be included in the parsed document. When false, any document type declaration encountered will be discarded. The default is false, which matches the behavior of previous versions.

Note that the parser only includes the document type declaration in the node tree; it doesn't actually validate the document against the DTD, load external DTDs, or resolve custom entity references.

This option is useful if you want to preserve the document type declaration when later serializing a document back to XML. Previously, the document type declaration was always discarded, which meant that if you parsed a document with a document type declaration and then serialized it, the original document type declaration would be lost.
```
const { parseXml } = require('@rgrove/parse-xml');

let xml = '<!DOCTYPE root SYSTEM "root.dtd"><root />';
let doc = parseXml(xml, { preserveDocumentType: true });

console.log(doc.children[0].toJSON());
// => { type: 'doctype', name: 'root', systemId: 'root.dtd' }

xml = '<!DOCTYPE kittens [<!ELEMENT kittens (#PCDATA)>]><kittens />';
doc = parseXml(xml, { preserveDocumentType: true });

console.log(doc.children[0].toJSON());
// => {
//   type: 'doctype',
//   name: 'kittens',
//   internalSubset: '<!ELEMENT kittens (#PCDATA)>'
// }
```

Changed

Errors thrown by the parser are now instances of a new XmlError class, which extends Error. These errors still have all the same properties as before, but now with improved type definitions. #27

Fixed

Leading and trailing whitespace in comment content is no longer trimmed. This issue only affected parsing when the preserveComments parser option was enabled. #28
Text content following a CDATA section is no longer appended to the preceding XmlCdata node. This issue only affected parsing when the preserveCdata parser option was enabled. #29

Readme

Source

parse-xml

A fast, safe, compliant XML parser for Node.js and browsers.

Installation

npm install @rgrove/parse-xml

Or, if you like living dangerously, you can load the minified bundle in a browser via Unpkg and use the parseXml global.

Features

Returns a convenient object tree representing an XML document.
Works great in Node.js and browsers.
Provides helpful, detailed error messages with context when a document is not well-formed.
Mostly conforms to XML 1.0 (Fifth Edition) as a non-validating parser (see below for details).
Passes all relevant tests in the XML Conformance Test Suite.
Written in TypeScript and compiled to ES2020 JavaScript for Node.js and ES2017 JavaScript for browsers. The browser build is also optimized for minification.
Extremely fast and surprisingly small.
Zero dependencies.

Not Features

While this parser is capable of parsing document type declarations (<!DOCTYPE ... >) and including them in the node tree, it doesn't actually do anything with them. External document type definitions won't be loaded, and the parser won't validate the document against a DTD or resolve custom entity references defined in a DTD.

In addition, the only supported character encoding is UTF-8 because it's not feasible (or useful) to support other character encodings in JavaScript.

Examples

Basic Usage

ESM

import { parseXml } from '@rgrove/parse-xml';
parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

CommonJS

const { parseXml } = require('@rgrove/parse-xml');
parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

The result is an XmlDocument instance containing the parsed document, with a structure that looks like this (some properties and methods are excluded for clarity; see the API docs for details):

{
  type: 'document',
  children: [
    {
      type: 'element',
      name: 'kittens',
      attributes: {
        fuzzy: 'yes'
      },
      children: [
        {
          type: 'text',
          text: 'I like fuzzy kittens.'
        }
      ],
      parent: { ... },
      isRootNode: true
    }
  ]
}

All parse-xml objects have toJSON() methods that return JSON-serializable objects, so you can easily convert an XML document to JSON:

let json = JSON.stringify(parseXml(xml));

Friendly Errors

When something goes wrong, parse-xml throws an error that tells you exactly what happened and shows you where the problem is so you can fix it.

parseXml('<foo><bar>baz</foo>');

Output

Error: Missing end tag for element bar (line 1, column 14)
  <foo><bar>baz</foo>
               ^

In addition to a helpful message, error objects have the following properties:

column Number

Column where the error occurred (1-based).
excerpt String

Excerpt from the input string that contains the problem.
line Number

Line where the error occurred (1-based).
pos Number

Character position where the error occurred relative to the beginning of the input (0-based).

Why another XML parser?

There are many XML parsers for Node, and some of them are good. However, most of them suffer from one or more of the following shortcomings:

Native dependencies.
Loose, non-standard parsing behavior that can lead to unexpected or even unsafe results when given input the author didn't anticipate.
Kitchen sink APIs that tightly couple a parser with DOM manipulation functions, a stringifier, or other tooling that isn't directly related to parsing and consuming XML.
Stream-based parsing. This is great in the rare case that you need to parse truly enormous documents, but can be a pain to work with when all you want is a node tree.
Poor error handling.
Too big or too Node-specific to work well in browsers.

parse-xml's goal is to be a small, fast, safe, compliant, non-streaming, non-validating, browser-friendly parser, because I think this is an under-served niche.

I think parse-xml demonstrates that it's not necessary to jettison the spec entirely or to write complex code in order to implement a small, fast XML parser.

Also, it was fun.

Benchmark

Here's how parse-xml's performance stacks up against a few comparable libraries:

fast-xml-parser, which claims to be the fastest pure JavaScript XML parser
libxmljs2, which is based on the native libxml library written in C
xmldoc, which is based on sax-js

While libxmljs2 is faster at parsing medium and large documents, its performance comes at the expense of a large C dependency, no browser support, and a history of security vulnerabilities in the underlying libxml2 library.

In these results, "ops/s" refers to operations per second. Higher is faster.

Node.js v18.14.0 / Darwin arm64
Apple M1 Max

Running "Small document (291 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 4.1.0:
    191 553 ops/s, ±0.10%   | fastest

  fast-xml-parser 4.1.1:
    142 565 ops/s, ±0.11%   | 25.57% slower

  libxmljs2 0.31.0 (native):
    74 646 ops/s, ±0.30%    | 61.03% slower

  xmldoc 1.2.0 (sax-js):
    66 823 ops/s, ±0.09%    | slowest, 65.12% slower

Finished 4 cases!
  Fastest: @rgrove/parse-xml 4.1.0
  Slowest: xmldoc 1.2.0 (sax-js)

Running "Medium document (72081 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 4.1.0:
    1 065 ops/s, ±0.11%   | 49.81% slower

  fast-xml-parser 4.1.1:
    637 ops/s, ±0.12%     | 69.98% slower

  libxmljs2 0.31.0 (native):
    2 122 ops/s, ±2.48%   | fastest

  xmldoc 1.2.0 (sax-js):
    444 ops/s, ±0.36%     | slowest, 79.08% slower

Finished 4 cases!
  Fastest: libxmljs2 0.31.0 (native)
  Slowest: xmldoc 1.2.0 (sax-js)

Running "Large document (1162464 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 4.1.0:
    93 ops/s, ±0.10%    | 53.27% slower

  fast-xml-parser 4.1.1:
    48 ops/s, ±0.60%    | 75.88% slower

  libxmljs2 0.31.0 (native):
    199 ops/s, ±1.47%   | fastest

  xmldoc 1.2.0 (sax-js):
    38 ops/s, ±0.09%    | slowest, 80.9% slower

Finished 4 cases!
  Fastest: libxmljs2 0.31.0 (native)
  Slowest: xmldoc 1.2.0 (sax-js)

See the parse-xml-benchmark repo for instructions on how to run this benchmark yourself.

License

ISC License

Keywords

FAQs

What is @rgrove/parse-xml?

Is @rgrove/parse-xml popular?

Is @rgrove/parse-xml well maintained?

Last updated on 05 Feb 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@rgrove/parse-xml

4.1.0 (2023-02-04)

Added

Changed

Fixed

parse-xml

Links

Installation

Features

Not Features

Examples

Basic Usage

Friendly Errors

Why another XML parser?

Benchmark

License

Keywords

Related posts

@rgrove/parse-xml

.css-1z04cui{margin-bottom:var(--chakra-space-4);font-size:var(--chakra-fontSizes-md);}4.1.0 (2023-02-04)

Added

Changed

Fixed

parse-xml

Links

Installation

Features

Not Features

Examples

Basic Usage

Friendly Errors

Why another XML parser?

Benchmark

License

Keywords

Related posts

The AI Advantage: Reshaping Cybersecurity in the Age of Autonomous Threats

UnitedHealth Group Discloses Protected Health Information Compromised for “Substantial Portion of People in America” in Recent Cyberattack

4.1.0 (2023-02-04)