Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

sax-wasm

Package Overview
Dependencies
Maintainers
1
Versions
37
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sax-wasm

An extremely fast JSX, HTML and XML parser written in Rust compiled to WebAssembly for Node and the Web

  • 1.2.0
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
5.3K
decreased by-18.24%
Maintainers
1
Weekly downloads
 
Created
Source

SAX (Simple API for XML) for WebAssembly

Build Status Coverage Status

When you absolutely, positively have to have the fastest parser in the room, accept no substitutes.

The first streamable, low memory XML, HTML, and JSX parser for WebAssembly.

Sax Wasm is a sax style parser for XML, HTML and JSX written in Rust, compiled for WebAssembly with the sole motivation to bring near native speeds to XML and JSX parsing for node and the web. Inspired by sax js and rebuilt with Rust for WebAssembly, sax-wasm brings optimizations for speed and support for JSX syntax.

Suitable for LSP implementations, sax-wasm provides line numbers and character positions within the document for elements, attributes and text node which provides the raw building blocks for linting, transpilation and lexing.

Installation

npm i -s sax-wasm

Usage in Node

const fs = require('fs');
const { SaxEventType, SAXParser } = require('sax-wasm');

// Get the path to the WebAssembly binary and load it
const saxPath = require.resolve('sax-wasm/lib/sax-wasm.wasm');
const saxWasmBuffer = fs.readFileSync(saxPath);

// Instantiate 
const parser = new SAXParser(SaxEventType.Attribute | SaxEventType.OpenTag);
parser.eventHandler = (event, data) => {
  if (event === SaxEventType.Attribute ) {
    // process attribute
  } else {
    // process open tag
  }
};

// Instantiate and prepare the wasm for parsing
parser.prepareWasm(saxWasmBuffer).then(ready => {
  if (ready) {
    parser.write('<div class="modal"></div>');
    parser.end();
  }
});

Usage for the web

import { SaxEventType, SAXParser } from 'sax-wasm';

async function loadAndPrepareWasm() {
  const saxWasmResponse = await fetch('./path/to/wasm/sax-wasm.wasm');
  const saxWasmbuffer = await saxWasmResponse.arrayBuffer();
  const parser = new SAXParser(SaxEventType.Attribute | SaxEventType.OpenTag);
  
  // Instantiate and prepare the wasm for parsing
  const ready = await parser.prepareWasm(new Uint8Array(saxWasmbuffer));
  if (ready) {
    return parser;
  }
}

loadAndPrepareWasm().then(processDocument);

function processDocument(parser) {
  parser.eventHandler = (event, data) => {
    if (event === SaxEventType.Attribute ) {
        // process attribute
      } else {
        // process open tag
      }
  }
  parser.write('<div class="modal"></div>');
  parser.end();
}

Differences from sax-js

Besides being incredibly fast, there are some notable differences between sax-wasm and sax-js that may affect some users when migrating:

  1. JSX is supported including JSX fragments. Things like <foo bar={this.bar()}></bar> and <><foo/><bar/></> will parse as expected.
  2. No attempt is made to validate the document. sax-wasm reports what it sees. If you need strict mode or document validation, it may be recreated by applying rules to the events that are reported by the parser.
  3. Namespaces are reported in attributes. No special events dedicated to namespaces.
  4. The parser is ready as soon as the promise is handled.

Streaming

Streaming is supported with sax-wasm by writing utf-8 encoded text to the parser instance. Writes can occur safely anywhere except within the eventHandler function or within the eventTrap (when extending SAXParser class). Doing so anyway risks overwriting memory still in play.

Events

Events are subscribed to using a bitmask composed from flags representing the event type. Bit positions along a 12 bit integer can be masked on to tell the parser to emit the event of that type. For example, passing in the following bitmask to the parser instructs it to emit events for text, open tags and attributes:

import { SaxEventType } from 'sax-wasm';
parser.events = SaxEventType.Text | SaxEventType.OpenTag | SaxEventType.Attribute;

Complete list of event/argument pairs:

EventMaskArgument passed to handler
SaxEventType.Text0b000000000001text: Text
SaxEventType.ProcessingInstruction0b000000000010procInst: string
SaxEventType.SGMLDeclaration0b000000000100sgmlDecl: string
SaxEventType.Doctype0b000000001000doctype: string
SaxEventType.Comment0b000000010000comment: string
SaxEventType.OpenTagStart0b000000100000tag: Tag
SaxEventType.Attribute0b000001000000attribute: Attribute
SaxEventType.OpenTag0b000010000000tag: Tag
SaxEventType.CloseTag0b000100000000tag: Tag
SaxEventType.OpenCDATA0b001000000000start: Position
SaxEventType.CDATA0b010000000000cdata: string
SaxEventType.CloseCDATA0b100000000000end: Position

SAXParser.js

Methods

  • prepareWasm(wasm: Uint8Array): Promise<boolean> - Instantiates the wasm binary with reasonable defaults and stores the instance as a member of the class. Always resolves to true or throws if something went wrong.

  • write(buffer: string): void; - writes the supplied string to the wasm stream and kicks off processing. The character and line counters are not reset.

  • end(): void; - Ends processing for the stream. The character and line counters are reset to zero and the parser is readied for the next document.

Properties

  • events - A bitmask containing the events to subscribe to. See the examples for creating the bitmask

  • eventHandler - A function reference used for event handling. The supplied function must have a signature that accepts 2 arguments: 1. The event which is one of the SaxEventTypes and the body (listed in the table above)

sax-wasm.wasm

Methods

  • parser(events: u32) - Prepares the parser struct internally and supplies it with the specified events bitmask. Changing the events bitmask can be done at anytime during processing using this method.

  • write(ptr: *mut u8, length: usize) - Supplies the parser with the location and length of the newly written bytes in the stream and kicks off processing. The parser assumes that the bytes are valid utf-8. Writing non utf-8 bytes may cause unpredictable behavior.

  • end() - resets the character and line counts but does not halt processing of the current buffer.

Building from source

Prerequisites

This project requires rust v1.30+ since it contains the wasm32-unknown-unknown target out of the box.

Install rust:

curl https://sh.rustup.rs -sSf | sh

Install the stable compiler and switch to it.

rustup install stable
rustup default stable

Install the wasm32-unknown-unknown target.

rustup target add wasm32-unknown-unknown --toolchain stable

Install node with npm then run the following command from the project root.

npm install

Install the wasm-bindgen-cli tool

cargo install wasm-bindgen-cli

The project can now be built using:

npm run build

The artifacts from the build will be located in the /libs directory.

Keywords

FAQs

Package last updated on 04 Feb 2019

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc