
Security News
libxml2 Maintainer Ends Embargoed Vulnerability Reports, Citing Unsustainable Burden
Libxml2’s solo maintainer drops embargoed security fixes, highlighting the burden on unpaid volunteers who keep critical open source software secure.
@streamparser/json-whatwg
Advanced tools
Streaming JSON parser in Javascript for Node.js, Deno and the browser
Fast dependency-free library to parse a JSON stream using utf-8 encoding in Node.js, Deno or any modern browser. Fully compliant with the JSON spec and JSON.parse(...)
.
tldr;
import { JSONParser } from '@streamparser/json-whatwg';
const inputStream = new ReadableStream({
async start(controller) {
controller.enqueue('{ "test": ["a"] }');
controller.close();
},
});
const parser = new JSONParser();
const reader = inputStream.pipeThrough(jsonparser).pipeTo(destinationStream)
// Or manually getting the values
const reader = inputStream.pipeThrough(jsonparser).getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
processValue(value);
// There will be 3 value:
// "a"
// ["a"]
// { test: ["a"] }
}
There are multiple flavours of @streamparser:
@streamparser/json
into a WHATWG TransformStream.@streamparser/json
into a node Transform stream.@streamparser/json requires a few ES6 classes:
If you are targeting browsers or systems in which these might be missing, you need to polyfil them.
A JSON compliant tokenizer that parses a utf-8 stream into JSON tokens
import { Tokenizer } from '@streamparser/json-whatwg';
const tokenizer = new Tokenizer(opts, writableStrategy, readableStrategy);
Writable and readable strategy are standard WhatWG Stream settings (see MDN).
The available options are:
{
stringBufferSize: <number>, // set to 0 to don't buffer. Min valid value is 4.
numberBufferSize: <number>, // set to 0 to don't buffer.
separator: <string>, // separator between object. For example `\n` for nd-js.
emitPartialTokens: <boolean> // whether to emit tokens mid-parsing.
}
If buffer sizes are set to anything else than zero, instead of using a string to apppend the data as it comes in, the data is buffered using a TypedArray. A reasonable size could be 64 * 1024
(64 KB).
When parsing strings or numbers, the parser needs to gather the data in-memory until the whole value is ready.
Strings are inmutable in Javascript so every string operation creates a new string. The V8 engine, behind Node, Deno and most modern browsers, performs a many different types of optimization. One of this optimizations is to over-allocate memory when it detects many string concatenations. This increases significatly the memory consumption and can easily exhaust your memory when parsing JSON containing very large strings or numbers. For those cases, the parser can buffer the characters using a TypedArray. This requires encoding/decoding from/to the buffer into an actual string once the value is ready. This is done using the TextEncoder
and TextDecoder
APIs. Unfortunately, these APIs creates a significant overhead when the strings are small so should be used only when strictly necessary.
A token parser that processes JSON tokens as emitted by the Tokenizer
and emits JSON values/objects.
import { TokenParser} from '@streamparser/json-whatwg';
const tokenParser = new TokenParser(opts, writableStrategy, readableStrategy);
Writable and readable strategy are standard WhatWG Stream settings (see MDN).
The available options are:
{
paths: <string[]>,
keepStack: <boolean>, // whether to keep all the properties in the stack
separator: <string>, // separator between object. For example `\n` for nd-js. If left empty or set to undefined, the token parser will end after parsing the first object. To parse multiple object without any delimiter just set it to the empty string `''`.
emitPartialValues: <boolean>, // whether to emit values mid-parsing.
}
undefined
which emits everything. The paths are intended to suppot jsonpath although at the time being it only supports the root object selector ($
) and subproperties selectors including wildcards ($.a
, $.*
, $.a.b
, , $.*.b
, etc).true
. When set to false
the it does preserve properties in the parent object some ancestor will be emitted. This means that the parent object passed to the onValue
function will be empty, which doesn't reflect the truth, but it's more memory-efficient.The full blown JSON parser. It basically chains a Tokenizer
and a TokenParser
.
import { JSONParser } from '@streamparser/json-whatwg';
const parser = new JSONParser();
You can use both components independently as
const tokenizer = new Tokenizer(opts);
const tokenParser = new TokenParser();
const jsonParser = tokenizer.pipeTrough(tokenParser);
You can subscribe to the resulting data using the
import { JSONParser } from '@streamparser/json-whatwg';
const inputStream = new ReadableStream({
async start(controller) {
controller.enqueue(parser.write('"Hello world!"')); // will log "Hello world!"
// Or passing the stream in several chunks
parser.write('"');
parser.write('Hello');
parser.write(' ');
parser.write('world!');
parser.write('"');// will log "Hello world!"
controller.close();
},
});
const parser = new JSONParser({ stringBufferSize: undefined, paths: ['$'] });
const reader = inputStream.pipeThrough(jsonparser).getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
console.log(value);
}
Write is always a synchronous operation so any error during the parsing of the stream will be thrown during the write operation. After an error, the parser can't continue parsing.
import { JSONParser } from '@streamparser/json-whatwg';
const inputStream = new ReadableStream({
async start(controller) {
controller.enqueue(parser.write('"""')); // will log "Hello world!"
controller.close();
},
});
const parser = new JSONParser({ stringBufferSize: undefined });
try {
const reader = inputStream.pipeThrough(parser).getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
console.log(value);
}
} catch (err) {
console.log(err); // logs
}
Imagine an endpoint that send a large amount of JSON objects one after the other ({"id":1}{"id":2}{"id":3}...
).
import { JSONParser} from '@streamparser/json-whatwg';
const parser = new JSONParser();
const response = await fetch('http://example.com/');
const reader = response.body.pipeThrough(parser).getReader();
while(true) {
const { done, value } = await reader.read();
if (done) break;
// TODO process element
}
Imagine an endpoint that send a large amount of JSON objects one after the other ([{"id":1},{"id":2},{"id":3},...]
).
import { JSONParser } from '@streamparser/json-whatwg';
const parser = new JSONParser({ stringBufferSize: undefined, paths: ['$.*'], keepStack: false });
const response = await fetch('http://example.com/');
const reader = response.body.pipeThrough(parser).getReader();
while(true) {
const { done, value: parsedElementInfo } = await reader.read();
if (done) break;
const { value, key, parent, stack } = parsedElementInfo;
// TODO process element
}
Imagine an endpoint that send a large amount of JSON objects one after the other ("Once upon a midnight <...>"
).
import { JSONParser } from '@streamparser/json-whatwg';
const parser = new JSONParser({ stringBufferSize: undefined, paths: ['$.*'], keepStack: false });
const response = await fetch('http://example.com/');
const reader = response.body.pipeThrough(parser).getReader();
while(true) {
const { done, value: parsedElementInfo } = await reader.read();
if (done) break;
const { value, key, parent, stack, partial } = parsedElementInfo;
if (partial) {
console.log(`Parsing value: ${value}... (still parsing)`);
} else {
console.log(`Value parsed: ${value}`);
}
}
See LICENSE.md.
FAQs
Streaming JSON parser in Javascript for Node.js, Deno and the browser
The npm package @streamparser/json-whatwg receives a total of 5,124 weekly downloads. As such, @streamparser/json-whatwg popularity was classified as popular.
We found that @streamparser/json-whatwg demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Libxml2’s solo maintainer drops embargoed security fixes, highlighting the burden on unpaid volunteers who keep critical open source software secure.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.