What is stream-json?
The stream-json package is a Node.js library that provides a stream interface for parsing and stringifying JSON data. It allows for processing large JSON files or streams in a memory-efficient and time-efficient manner by working with JSON data in chunks rather than loading the entire file into memory.
What are stream-json's main functionalities?
Parsing JSON
Stream-json can parse JSON files of any size by breaking them down into smaller chunks and processing them one by one. This is particularly useful for working with large JSON files that cannot be loaded into memory all at once.
{"_readableState":{"objectMode":true},"readable":true,"_events":{},"_eventsCount":1}
Stringifying JSON
The package can also stringify large JSON objects by converting them into a stream of data. This allows for efficient writing of JSON data to a file or over the network.
{"_readableState":{"objectMode":true},"readable":true,"_events":{},"_eventsCount":1}
Filters and Transforms
Stream-json provides filters and transformation tools to selectively process and modify JSON data as it is being streamed. This can be used to extract or transform specific parts of the JSON data without having to manipulate the entire dataset.
{"_readableState":{"objectMode":true},"readable":true,"_events":{},"_eventsCount":1}
Other packages similar to stream-json
JSONStream
JSONStream is a package similar to stream-json that offers streaming JSON.parse and stringify. It is widely used and has a simple API, but stream-json provides a more modular approach with plugins and a richer set of features for filtering and transforming data.
big-json
big-json provides similar functionality for parsing and stringify large JSON files. It uses a streaming approach to handle large files, but stream-json has a more comprehensive set of tools for dealing with streams and allows for more complex processing pipelines.
stream-json
stream-json
is a micro-library of node.js stream components with minimal dependencies for creating custom data processors oriented on processing huge JSON files while requiring a minimal memory footprint. It can parse JSON files far exceeding available memory. Even individual primitive data items (keys, strings, and numbers) can be streamed piece-wise. Streaming SAX-inspired event-based API is included as well.
Available components:
- Streaming JSON Parser.
- It produces a SAX-like token stream.
- Optionally it can pack keys, strings, and numbers (controlled separately).
- The main module provides helpers to create a parser.
- Filters to edit a token stream:
- Pick selects desired objects.
- It can produces multiple top-level objects just like in JSON Streaming protocol.
- Don't forget to use StreamValues when picking several subobjects!
- Replace substitutes objects with a replacement.
- Ignore removes objects.
- Filter filters tokens maintaining stream's validity.
- Streamers to produce a stream of JavaScript objects.
- StreamValues can handle a stream of JSON objects.
- Useful to stream objects selected by
Pick
, or generated by other means. - It supports JSON Streaming protocol, where individual values are separated semantically (like in
"{}[]"
), or with white spaces (like in "true 1 null"
).
- StreamArray takes an array of objects and produces a stream of its components.
- It streams array components individually taking care of assembling them automatically.
- Created initially to deal with JSON files similar to Django-produced database dumps.
- Only one top-level array per stream is valid!
- StreamObject takes an object and produces a stream of its top-level properties.
- Only one top-level object per stream is valid!
- Essentials:
- Assembler interprets a token stream creating JavaScript objects.
- Disassembler produces a token stream from JavaScript objects.
- Stringer converts a token stream back into a JSON text stream.
- Emitter reads a token stream and emits each token as an event.
- It can greatly simplify data processing.
- Utilities:
- emit() makes any stream component to emit tokens as events.
- withParser() helps to create stream components with a parser.
- Batch batches items into arrays to simplify their processing.
- Verifier reads a stream and verifies that it is a valid JSON.
- Special helpers:
- JSONL AKA JSON Lines:
- jsonl/Parser parses a JSONL file producing objects similar to
StreamValues
.
- Useful when we know that individual items can fit in memory.
- Generally it is faster than the equivalent combination of
Parser({jsonStreaming: true})
+ StreamValues
.
- jsonl/Stringer produces a JSONL file from a stream of JavaScript objects.
- Generally it is faster than the equivalent combination of
Disassembler
+ Stringer
.
All components are meant to be building blocks to create flexible custom data processing pipelines. They can be extended and/or combined with custom code. They can be used together with stream-chain to simplify data processing.
This toolkit is distributed under New BSD license.
Introduction
const {chain} = require('stream-chain');
const {parser} = require('stream-json');
const {pick} = require('stream-json/filters/Pick');
const {ignore} = require('stream-json/filters/Ignore');
const {streamValues} = require('stream-json/streamers/StreamValues');
const fs = require('fs');
const zlib = require('zlib');
const pipeline = chain([
fs.createReadStream('sample.json.gz'),
zlib.createGunzip(),
parser(),
pick({filter: 'data'}),
ignore({filter: /\b_meta\b/i}),
streamValues(),
data => {
const value = data.value;
return value && value.department === 'accounting' ? data : null;
}
]);
let counter = 0;
pipeline.on('data', () => ++counter);
pipeline.on('end', () =>
console.log(`The accounting department has ${counter} employees.`));
See the full documentation in Wiki.
Companion projects:
- stream-csv-as-json streams huge CSV files in a format compatible with
stream-json
:
rows as arrays of string values. If a header row is used, it can stream rows as objects with named fields.
Installation
npm install --save stream-json
Use
The whole library is organized as a set of small components, which can be combined to produce the most effective pipeline. All components are based on node.js
streams, and events. They implement all required standard APIs. It is easy to add your
own components to solve your unique tasks.
The code of all components is compact and simple. Please take a look at their source code to see how things are implemented, so you can produce your own components
in no time.
Obviously, if a bug is found, or a way to simplify existing components, or new generic components are created, which can be reused in a variety of projects,
don't hesitate to open a ticket, and/or create a pull request.
Release History
- 1.6.0 added
jsonl/Parser
and jsonl/Stringer
. - 1.5.0
Disassembler
and streamers now follow JSON.stringify()
and JSON.parse()
protocols respectively including replacer
and reviver
. - 1.4.1 bugfix:
Stringer
with makeArray
should produce empty array if no input. - 1.4.0 added
makeArray
functionality to Stringer
. Thx all who asked for it! - 1.3.3 bugfix: very large/infinite streams with garbage didn't fail. Thx Arne Marschall!
- 1.3.2 bugfix: filters could fail with packed-only token streams. Thx Trey Brisbane!
- 1.3.1 bugfix: reverted the last bugfix in
Verifier
, a bugfix in tests, thx Guillermo Ares. - 1.3.0 added
Batch
, a bugfix in Verifier
. - 1.2.1 the technical release.
- 1.2.0 added
Verifier
. - 1.1.4 fixed
Filter
going haywire, thx @codebling! - 1.1.3 fixed
Parser
streaming numbers when shouldn't, thx Grzegorz Lachowski! - 1.1.2 fixed
Stringer
not escaping some symbols, thx Pavel Bardov! - 1.1.1 minor updates in docs and comments.
- 1.1.0 added
Disassembler
. - 1.0.3 minor tweaks, added TypeScript typings and the badge.
- 1.0.2 minor tweaks, documentation improvements.
- 1.0.1 reorg to fix export problems.
- 1.0.0 the first 1.0 release.
- 0.6.1 the technical release.
- 0.6.0 added Stringer to convert event streams back to JSON.
- 0.5.3 bug fix to allow empty JSON Streaming.
- 0.5.2 bug fixes in
Filter
. - 0.5.1 corrected README.
- 0.5.0 added support for JSON Streaming.
- 0.4.2 refreshed dependencies.
- 0.4.1 added
StreamObject
by Sam Noedel. - 0.4.0 new high-performant Combo component, switched to the previous parser.
- 0.3.0 new even faster parser, bug fixes.
- 0.2.2 refreshed dependencies.
- 0.2.1 added utilities to filter objects on the fly.
- 0.2.0 new faster parser, formal unit tests, added utilities to assemble objects on the fly.
- 0.1.0 bug fixes, more documentation.
- 0.0.5 bug fixes.
- 0.0.4 improved grammar.
- 0.0.3 the technical release.
- 0.0.2 bug fixes.
- 0.0.1 the initial release.