
Research
Malicious Go “crypto” Module Steals Passwords and Deploys Rekoobe Backdoor
An impersonated golang.org/x/crypto clone exfiltrates passwords, executes a remote shell stager, and delivers a Rekoobe backdoor on Linux.
xml-tokenizer
Advanced tools
Straightforward and typesafe XML tokenizer that streams tokens through a callback mechanism
Status: Experimental
xml-tokenizer is a straightforward and typesafe XML tokenizer that streams tokens through a callback mechanism.
The implementation is based on the roxmltree tokenizer.rs. See the FAQ why we did not embed the roxmltree crate as WASM.
SAX approachtxml but its still twice as fast as fast-xml-parserCreate a typesafe, straightforward, and lightweight XML parser. Many existing parsers either lack TypeScript support, aren't actively maintained, or exceed 20kB gzipped.
My goal was to develop an efficient & flexible alternative by porting roxmltree to TypeScript or integrating it via WASM. While it functions well and is quite versatile due to its streaming approach, it's not as fast as I hoped.
import { select, tokenize, xmlToObject, xmlToSimplifiedObject } from 'xml-tokenizer';
// Parse XML to Javascript object without information lost (uses `tokenize` under the hood)
const xmlObject = xmlToObject('<p>Hello World</p>');
// Or, parse XML to easy to queryable Javascript object
const simplifiedXmlObject = xmlToSimplifiedObject('<p>Hello World</p>');
// Or, parse XML to a stream of tokens
tokenize('<p>Hello World</p>', (token) => {
switch (token.type) {
case 'ElementStart':
console.log('Start of element:', token);
break;
case 'Text':
console.log('Text content:', token.text);
break;
// Handle other token types as needed
default:
console.log('Token:', token);
}
});
// Or, stream only a selection of tokens
select(
xml,
[
[
{ axis: 'child', local: 'bookstore' },
{ axis: 'child', local: 'book', attributes: [{ local: 'category', value: 'COOKING' }] }
]
],
(selectedToken) => {
// Handle selected token
}
);
The following token types are supported:
<?target content?><!-- text --><!ENTITY ns_extend "http://test.com"><ns:elemns:attr="value"></ns:name>/><![CDATA[text]]>Name="Value". An attribute without a value is not valid XML.true (e.g., <element attribute/> is parsed as attribute="true").The performance of xml-tokenizer was benchmarked against other popular XML parsers. These tests focus on XML to object conversion and node counting. Interestingly, the version of xml-tokenizer imported directly from npm performed significantly better. The reason for this discrepancy is unclear, but the results seem accurate based on external testing.
| Parser | Operations per Second (ops/sec) | Min Time (ms) | Max Time (ms) | Mean Time (ms) | Relative Margin of Error (rme) |
|---|---|---|---|---|---|
| xml-tokenizer | 46.87 | 19.47 | 24.57 | 21.33 | ±2.06% |
| xml-tokenizer (dist) | 53.70 | 17.31 | 25.20 | 18.62 | ±3.28% |
| xml-tokenizer (npm) | 163.00 | 5.03 | 8.50 | 6.13 | ±2.32% |
| fast-xml-parser | 66.00 | 14.01 | 20.73 | 15.15 | ±3.34% |
| txml | 234.52 | 3.38 | 7.61 | 4.26 | ±4.00% |
| xml2js | 36.21 | 25.58 | 37.28 | 27.61 | ±4.39% |
| Parser | Operations per Second (ops/sec) | Min Time (ms) | Max Time (ms) | Mean Time (ms) | Relative Margin of Error (rme) |
|---|---|---|---|---|---|
| xml-tokenizer | 53.03 | 18.30 | 19.45 | 18.86 | ±0.81% |
| xml-tokenizer (npm) | 166.61 | 5.62 | 7.16 | 6.00 | ±0.88% |
| saxen | 500.99 | 1.83 | 4.79 | 2.00 | ±1.52% |
| sax | 64.44 | 14.96 | 16.34 | 15.52 | ±0.67% |
The benchmarks can be found in the __tests__ directory and can be executed by running:
pnpm run bench
We removed the Rust implementation to improve maintainability and because it didn't provide the expected performance boost.
Calling a TypeScript function from Rust on every token event (wasmMix benchmark) results in slow communication, negating Rust's performance benefits. Parsing XML entirely on the Rust site (wasm benchmark) avoids frequent communication but is still too slow due to the overhead of serializing and deserializing data between JavaScript and Rust (mainly the resulting XML-Object). While Rust parsing without returning results is faster than any JavaScript XML parser, needing results in the JavaScript layer makes this approach impractical.
The roxmltree package with the Rust implementation can be found in the _deprecated folder (packages/_deprecated/roxmltree_wasm).
| Parser | Operations per Second (ops/sec) | Min Time (ms) | Max Time (ms) | Mean Time (ms) | Relative Margin of Error (rme) |
|---|---|---|---|---|---|
| roxmltree:text | 67.12 | 14.33 | 83.29 | 80.08 | ±1.27% |
| roxmltree:wasmMix | 28.17 | 34.83 | 36.71 | 35.49 | ±0.91% |
| roxmltree:wasm | 109.30 | 8.30 | 13.16 | 9.15 | ±3.31% |
tokenizer.rs to TypeScript?We ported tokenizer.rs to TypeScript because frequent communication between Rust and TypeScript negated Rust's performance benefits. The stream architecture required constant interaction between Rust and TypeScript via the tokenCallback, reducing overall efficiency.
We removed the byte-based implementation to enhance maintainability and because it didn't provide the expected performance improvement.
Decoding Uint8Array snippets to JavaScript strings is frequently necessary, nearly on every token event. This decoding process is slow, making this approach less efficient than working directly with strings.
| Parser | Operations per Second (ops/sec) | Min Time (ms) | Max Time (ms) | Mean Time (ms) | Relative Margin of Error (rme) |
|---|---|---|---|---|---|
| roxmltree:text | 67.12 | 14.33 | 83.29 | 80.08 | ±1.27% |
| roxmltree:byte | 12.48 | 78.65 | 16.45 | 14.90 | ±1.15% |
The roxmltree package with the Byte-Based implementation can be found in the _deprecated folder (packages/_deprecated/roxmltree_byte-only).
While generators can improve developer experience, they introduce significant performance overhead. Our benchmarks show that using a generator dramatically increases the execution time compared to the callback approach. Given our focus on performance, we chose to maintain the callback implementation.
See Generator vs Iterator vs Callback for more details.
[xml-tokenizer] Total Time: 5345.0000 ms | Average Time per Run: 53.4500 ms | Median Time: 53.0000 ms | Runs: 100
[txml] Total Time: 395.0000 ms | Average Time per Run: 3.9500 ms | Median Time: 4.0000 ms | Runs: 100
[fast-xml-parser] Total Time: 1290.0000 ms | Average Time per Run: 12.9000 ms | Median Time: 13.0000 ms | Runs: 100
[xml-tokenizer] Total Time: 662.0000 ms | Average Time per Run: 6.6200 ms | Median Time: 6.0000 ms | Runs: 100
[txml] Total Time: 394.0000 ms | Average Time per Run: 3.9400 ms | Median Time: 4.0000 ms | Runs: 100
[fast-xml-parser] Total Time: 1308.0000 ms | Average Time per Run: 13.0800 ms | Median Time: 13.0000 ms | Runs: 100
Benchmark implementation in Vanilla Profiler
FAQs
Straightforward and typesafe XML tokenizer that streams tokens through a callback mechanism
The npm package xml-tokenizer receives a total of 0 weekly downloads. As such, xml-tokenizer popularity was classified as not popular.
We found that xml-tokenizer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An impersonated golang.org/x/crypto clone exfiltrates passwords, executes a remote shell stager, and delivers a Rekoobe backdoor on Linux.

Security News
npm rolls out a package release cooldown and scalable trusted publishing updates as ecosystem adoption of install safeguards grows.

Security News
AI agents are writing more code than ever, and that's creating new supply chain risks. Feross joins the Risky Business Podcast to break down what that means for open source security.