Comparing version 1.4.0 to 1.4.1
{ | ||
"name": "sax-wasm", | ||
"version": "1.4.0", | ||
"version": "1.4.1", | ||
"repository": "https://github.com/justinwilaby/sax-wasm", | ||
@@ -5,0 +5,0 @@ "description": "An extremely fast JSX, HTML and XML parser written in Rust compiled to WebAssembly for Node and the Web", |
@@ -10,9 +10,9 @@ # SAX (Simple API for XML) for WebAssembly | ||
Sax Wasm is a sax style parser for XML, HTML and JSX written in [Rust](https://www.rust-lang.org/en-US/), compiled for | ||
WebAssembly with the sole motivation to bring **near native speeds** to XML and JSX parsing for node and the web. | ||
Inspired by [sax js](https://github.com/isaacs/sax-js) and rebuilt with Rust for WebAssembly, sax-wasm brings optimizations | ||
for speed and support for JSX syntax. | ||
Sax Wasm is a sax style parser for XML, HTML and JSX written in [Rust](https://www.rust-lang.org/en-US/), compiled for | ||
WebAssembly with the sole motivation to bring **near native speeds** to XML and JSX parsing for node and the web. | ||
Inspired by [sax js](https://github.com/isaacs/sax-js) and rebuilt with Rust for WebAssembly, sax-wasm brings optimizations | ||
for speed and support for JSX syntax. | ||
Suitable for [LSP](https://langserver.org/) implementations, sax-wasm provides line numbers and character positions within the | ||
document for elements, attributes and text node which provides the raw building blocks for linting, transpilation and lexing. | ||
Suitable for [LSP](https://langserver.org/) implementations, sax-wasm provides line numbers and character positions within the | ||
document for elements, attributes and text node which provides the raw building blocks for linting, transpilation and lexing. | ||
@@ -27,2 +27,3 @@ | ||
const fs = require('fs'); | ||
const path = require('path'); | ||
const { SaxEventType, SAXParser } = require('sax-wasm'); | ||
@@ -34,7 +35,7 @@ | ||
// Instantiate | ||
// Instantiate | ||
const options = {highWaterMark: 32 * 1024}; // 32k chunks | ||
const parser = new SAXParser(SaxEventType.Attribute | SaxEventType.OpenTag, options); | ||
parser.eventHandler = (event, data) => { | ||
if (event === SaxEventType.Attribute ) { | ||
if (event === SaxEventType.Attribute) { | ||
// process attribute | ||
@@ -49,3 +50,4 @@ } else { | ||
if (ready) { | ||
const readable = fs.createReadStream(path.resolve(__dirname + '/path/to/doument.xml'), options); | ||
// stream from a file in the current directory | ||
const readable = fs.createReadStream(path.resolve(path.resolve('.', 'path/to/document.xml')), options); | ||
readable.on('data', (chunk) => { | ||
@@ -57,3 +59,2 @@ parser.write(chunk); | ||
}); | ||
``` | ||
@@ -69,3 +70,3 @@ ## Usage for the web | ||
const parser = new SAXParser(SaxEventType.Attribute | SaxEventType.OpenTag, {highWaterMark: 64 * 1024}); // 64k chunks | ||
// Instantiate and prepare the wasm for parsing | ||
@@ -88,3 +89,3 @@ const ready = await parser.prepareWasm(new Uint8Array(saxWasmbuffer)); | ||
} | ||
fetch('path/to/document.xml').then(async response => { | ||
@@ -112,3 +113,3 @@ if (!response.ok) { | ||
1. JSX is supported including JSX fragments. Things like `<foo bar={this.bar()}></bar>` and `<><foo/><bar/></>` will parse as expected. | ||
1. No attempt is made to validate the document. sax-wasm reports what it sees. If you need strict mode or document validation, it may | ||
1. No attempt is made to validate the document. sax-wasm reports what it sees. If you need strict mode or document validation, it may | ||
be recreated by applying rules to the events that are reported by the parser. | ||
@@ -118,9 +119,9 @@ 1. Namespaces are reported in attributes. No special events dedicated to namespaces. | ||
## Streaming | ||
Streaming is supported with sax-wasm by writing utf-8 code points (Uint8Array) to the parser instance. Writes can occur safely | ||
anywhere except within the `eventHandler` function or within the `eventTrap` (when extending `SAXParser` class). | ||
## Streaming | ||
Streaming is supported with sax-wasm by writing utf-8 code points (Uint8Array) to the parser instance. Writes can occur safely | ||
anywhere except within the `eventHandler` function or within the `eventTrap` (when extending `SAXParser` class). | ||
Doing so anyway risks overwriting memory still in play. | ||
## Events | ||
Events are subscribed to using a bitmask composed from flags representing the event type. | ||
Events are subscribed to using a bitmask composed from flags representing the event type. | ||
Bit positions along a 12 bit integer can be masked on to tell the parser to emit the event of that type. | ||
@@ -150,14 +151,14 @@ For example, passing in the following bitmask to the parser instructs it to emit events for text, open tags and attributes: | ||
## Speeding things up on large documents | ||
The speed of the sax-wasm parser is incredibly fast and can parse very large documents in a blink of an eye. Although | ||
it's performance out of the box is ridiculous, the JavaScript thread *must* be involved with transforming raw | ||
bytes to human readable data, there are times where slowdowns can occur if you're not careful. These are some of the | ||
The speed of the sax-wasm parser is incredibly fast and can parse very large documents in a blink of an eye. Although | ||
it's performance out of the box is ridiculous, the JavaScript thread *must* be involved with transforming raw | ||
bytes to human readable data, there are times where slowdowns can occur if you're not careful. These are some of the | ||
items to consider when top speed and performance is an absolute must: | ||
1. Stream your document from it's source as a `Uint8Array` - This is covered in the examples above. Things slow down | ||
significantly when the document is loaded in JavaScript as a string, then encoded to bytes using `Buffer.from(document)` or | ||
`new TextEncoder.encode(document)` before being passed to the parser. Encoding on the JavaScript thread is adds a non-trivial | ||
amount of overhead so its best to keep the data as raw bytes. Streaming often means the parser will already be done once | ||
1. Stream your document from it's source as a `Uint8Array` - This is covered in the examples above. Things slow down | ||
significantly when the document is loaded in JavaScript as a string, then encoded to bytes using `Buffer.from(document)` or | ||
`new TextEncoder.encode(document)` before being passed to the parser. Encoding on the JavaScript thread is adds a non-trivial | ||
amount of overhead so its best to keep the data as raw bytes. Streaming often means the parser will already be done once | ||
the document finishes downloading! | ||
1. Keep the events bitmask to a bare minimum whenever possible - the more events that are required, the more work the | ||
1. Keep the events bitmask to a bare minimum whenever possible - the more events that are required, the more work the | ||
JavaScript thread must do once sax-wasm.wasm reports back. | ||
1. Limit property reads on the reported data to only what's necessary - this includes things like stringifying the data to | ||
1. Limit property reads on the reported data to only what's necessary - this includes things like stringifying the data to | ||
json using `JSON.stringify()`. The first read of a property on a data object reported by the `eventHandler` will | ||
@@ -181,10 +182,10 @@ retrieve the value from raw bytes and convert it to a `string`, `number` or `Position` on the JavaScript thread. This | ||
- `prepareWasm(wasm: Uint8Array): Promise<boolean>` - Instantiates the wasm binary with reasonable defaults and stores | ||
- `prepareWasm(wasm: Uint8Array): Promise<boolean>` - Instantiates the wasm binary with reasonable defaults and stores | ||
the instance as a member of the class. Always resolves to true or throws if something went wrong. | ||
- `write(chunk: Uint8Array, offset: number = 0): void;` - writes the supplied bytes to the wasm memory buffer and kicks | ||
off processing. An optional offset can be provided if the read should occur at an index other than `0`. **NOTE:** | ||
- `write(chunk: Uint8Array, offset: number = 0): void;` - writes the supplied bytes to the wasm memory buffer and kicks | ||
off processing. An optional offset can be provided if the read should occur at an index other than `0`. **NOTE:** | ||
The `line` and `character` counters are *not* reset. | ||
- `end(): void;` - Ends processing for the stream. The `line` and `character` counters are reset to zero and the parser is | ||
- `end(): void;` - Ends processing for the stream. The `line` and `character` counters are reset to zero and the parser is | ||
readied for the next document. | ||
@@ -196,3 +197,3 @@ | ||
- `eventHandler` - A function reference used for event handling. The supplied function must have a signature that accepts | ||
- `eventHandler` - A function reference used for event handling. The supplied function must have a signature that accepts | ||
2 arguments: 1. The `event` which is one of the `SaxEventTypes` and the `body` (listed in the table above) | ||
@@ -208,7 +209,7 @@ | ||
- `write(ptr: *mut u8, length: usize)` - Supplies the parser with the location and length of the newly written bytes in the | ||
- `write(ptr: *mut u8, length: usize)` - Supplies the parser with the location and length of the newly written bytes in the | ||
stream and kicks off processing. The parser assumes that the bytes are valid utf-8 grapheme clusters. Writing non utf-8 bytes may cause | ||
unpredictable results but probably will not break. | ||
- `end()` - resets the `character` and `line` counts but does not halt processing of the current buffer. | ||
- `end()` - resets the `character` and `line` counts but does not halt processing of the current buffer. | ||
@@ -241,3 +242,3 @@ ## Building from source | ||
``` | ||
The project can now be built using: | ||
The project can now be built using: | ||
```bash | ||
@@ -244,0 +245,0 @@ npm run build |
Sorry, the diff of this file is not supported yet
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
234
54415