[!NOTE]
This is one of 198 standalone projects, maintained as part
of the @thi.ng/umbrella monorepo
and anti-framework.
🚀 Please help me to work full-time on these projects by sponsoring me on
GitHub. Thank you! ❤️
About
@thi.ng/transducers-based,
SAX-like,
non-validating, configurable, speedy & tiny XML
parser (~1.8KB gzipped).
Unlike the classic event-driven approach of SAX, this parser is
implemented as a transducer function, transforming an XML input into a
stream of SAX-event-like objects. Being a transducer, the parser can be
used in novel ways as part of a larger processing pipeline and can be
composed with other pre or post-processing steps, e.g. to filter or
transform element / attribute values or only do partial parsing with
early termination based on some condition.
Additionally, since by default the parser emits any children as part of
"element end" events, it can be used like a tree-walking DOM parser as
well (see SVG parsing example further below). The choice is yours!
Status
STABLE - used in production
Search or submit any issues for this package
Related packages
Installation
yarn add @thi.ng/sax
ESM import:
import * as sax from "@thi.ng/sax";
Browser ESM import:
<script type="module" src="https://esm.run/@thi.ng/sax"></script>
JSDelivr documentation
For Node.js REPL:
const sax = await import("@thi.ng/sax");
Package sizes (brotli'd, pre-treeshake): ESM: 1.41 KB
Dependencies
Note: @thi.ng/api is in most cases a type-only import (not used at runtime)
Usage examples
Two projects in this repo's
/examples
directory are using this package:
Screenshot | Description | Live demo | Source |
---|
| SVG path parsing & dynamic resampling | Demo | Source |
| XML/HTML/SVG to hiccup/JS conversion | Demo | Source |
API
Generated API docs
Basic usage
import * as sax from "@thi.ng/sax";
import * as tx from "@thi.ng/transducers";
src=`<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo bar>
<!-- comment -->
<a>
<b1>
<c x="23" y="42">ccc
<d>dd</d>
</c>
</b1>
<b2 foo="bar" />
</a>`
doc = [...tx.iterator(sax.parse(), src)]
doc = [...sax.parse(src)]
Partial parsing & result post-processing
As mentioned earlier, the transducer nature of this parser allows for
its easy integration into larger transformation pipelines. The next
example parses an SVG file, then extracts and selectively applies
transformations to only the <circle>
elements in the first group
(<g>
) element. Btw. The transformed elements can be serialized back
into SVG syntax using
@thi.ng/hiccup...
Given the composed transducer below, parsing stops immediately after the
first <g>
element is complete. This is because the matchFirst()
transducer will cause early termination once that element has been
processed.
import { parse, Type } from "@thi.ng/sax";
import * as tx from "@thi.ng/transducers";
svg=`
<?xml version="1.0"?>
<svg version="1.1" height="300" width="300" xmlns="http://www.w3.org/2000/svg">
<g fill="yellow">
<circle cx="50.00" cy="150.00" r="50.00" />
<circle cx="250.00" cy="150.00" r="50.00" />
<circle cx="150.00" cy="150.00" fill="rgba(0,255,255,0.25)" r="100.00" stroke="#ff0000" />
<rect x="80" y="80" width="140" height="140" fill="none" stroke="black" />
</g>
<g fill="none" stroke="black">
<circle cx="150.00" cy="150.00" r="50.00" />
<circle cx="150.00" cy="150.00" r="25.00" />
</g>
</svg>`;
[...tx.iterator(
tx.comp(
parse({ children: true }),
tx.matchFirst((e) => e.type == Type.ELEM_END && e.tag == "g"),
tx.mapcat((e) => e.children),
tx.filter((e) => e.tag == "circle"),
tx.map((e)=> [e.tag, {
...e.attribs,
cx: parseFloat(e.attribs.cx),
cy: parseFloat(e.attribs.cy),
r: parseFloat(e.attribs.r),
}])
),
svg
)]
DOM-style tree parsing using defmulti
This example shows how SVG can be parsed into
@thi.ng/hiccup
format.
import { defmulti, DEFAULT } from "@thi.ng/defmulti";
import { parse } from "@thi.ng/sax";
import * as tx from "@thi.ng/transducers";
const numericAttribs = (e, ...ids: string[]) =>
ids.reduce(
(acc, id) => (acc[id] = parseFloat(e.attribs[id]), acc),
{ ...e.attribs }
);
const parsedChildren = (e) =>
tx.iterator(
tx.comp(
tx.map(parseElement),
tx.filter((e)=> !!e),
),
e.children
);
const parseElement = defmulti((e) => e.tag);
parseElement.add("circle", (e) =>
[e.tag, numericAttribs(e, "cx", "cy", "r")]);
parseElement.add("rect", (e) =>
[e.tag, numericAttribs(e, "x", "y", "width", "height")]);
parseElement.add("g", (e) =>
[e.tag, e.attribs, ...parsedChildren(e)]);
parseElement.add("svg", (e) =>
[e.tag, numericAttribs(e, "width", "height"), ...parsedChildren(e)]);
parseElement.add(DEFAULT, () => null);
parseElement(tx.transduce(parse(), tx.last(), svg));
Error handling
If the parser encounters a syntax error, an error event value incl. a
description and input position will be produced (but no JS error will be
thrown) and the entire transducer pipeline stopped.
import { parse } from "@thi.ng/sax";
import { iterator } from "@thi.ng/transducers";
[...iterator(parse(), `a`)]
[...iterator(parse(), `<a><b></c></a>`)]
Emitted result type IDs
The type
key in each emitted result object is a TypeScript enum with the following values:
ID | Enum | Description |
---|
0 | Type.PROC | Processing instruction incl. attribs |
1 | Type.DOCTYPE | Doctype declaration body |
2 | Type.COMMENT | Comment body |
3 | Type.CDATA | CDATA content |
4 | Type.ELEM_START | Element start incl. attributes |
5 | Type.ELEM_END | Element end incl. attributes, body & children |
6 | Type.ELEM_BODY | Element text body |
7 | Type.ERROR | Parse error description |
Parser options
Option | Type | Default | Description |
---|
children | boolean | true | If true , recursively includes children elements in ELEM_END events. For very large documents, this should be disabled to save (or even fit into) memory. |
entities | boolean | false | If true , unescape standard XML entities in body text and attrib values. |
trim | boolean | false | If true , trims element body, comments and CDATA content. If the remaining string is empty, no event will be generated for this value. |
Authors
If this project contributes to an academic publication, please cite it as:
@misc{thing-sax,
title = "@thi.ng/sax",
author = "Karsten Schmidt",
note = "https://thi.ng/sax",
year = 2018
}
License
© 2018 - 2024 Karsten Schmidt // Apache License 2.0