Security News
Weekly Downloads Now Available in npm Package Search Results
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
jsonld-streaming-parser
Advanced tools
A fast and lightweight streaming and 100% spec-compliant JSON-LD parser, with RDFJS representations of RDF terms, quads and triples.
The streaming nature allows triples to be emitted as soon as possible, and documents larger than memory to be parsed.
$ npm install jsonld-streaming-parser
or
$ yarn add jsonld-streaming-parser
This package also works out-of-the-box in browsers via tools such as webpack and browserify.
import {JsonLdParser} from "jsonld-streaming-parser";
or
const JsonLdParser = require("jsonld-streaming-parser").JsonLdParser;
JsonLdParser
is a Node Transform stream
that takes in chunks of JSON-LD data,
and outputs RDFJS-compliant quads.
It can be used to pipe
streams to,
or you can write strings into the parser directly.
const myParser = new JsonLdParser();
fs.createReadStream('myfile.jsonld')
.pipe(myParser)
.on('data', console.log)
.on('error', console.error)
.on('end', () => console.log('All triples were parsed!'));
const myParser = new JsonLdParser();
myParser
.on('data', console.log)
.on('error', console.error)
.on('end', () => console.log('All triples were parsed!'));
myParser.write('{');
myParser.write(`"@context": "https://schema.org/",`);
myParser.write(`"@type": "Recipe",`);
myParser.write(`"name": "Grandma's Holiday Apple Pie",`);
myParser.write(`"aggregateRating": {`);
myParser.write(`"@type": "AggregateRating",`);
myParser.write(`"ratingValue": "4"`);
myParser.write(`}}`);
myParser.end();
This parser implements the RDFJS Sink interface,
which makes it possible to alternatively parse streams using the import
method.
const myParser = new JsonLdParser();
const myTextStream = fs.createReadStream('myfile.jsonld');
myParser.import(myTextStream)
.on('data', console.log)
.on('error', console.error)
.on('end', () => console.log('All triples were parsed!'));
Using a context
event listener,
you can collect all detected contexts.
const myParser = new JsonLdParser();
const myTextStream = fs.createReadStream('myfile.jsonld');
myParser.import(myTextStream)
.on('context', console.log)
.on('data', console.error)
.on('error', console.error)
.on('end', () => console.log('All triples were parsed!'));
Optionally, the following parameters can be set in the JsonLdParser
constructor:
dataFactory
: A custom RDFJS DataFactory to construct terms and triples. (Default: require('@rdfjs/data-model')
)context
: An optional root context to use while parsing. This can by anything that is accepted by jsonld-context-parser, such as a URL, object or array. (Default: {}
)baseIRI
: An initial default base IRI. (Default: ''
)allowOutOfOrderContext
: If @context definitions should be allowed as non-first object entries. When enabled, streaming results may not come as soon as possible, and will be buffered until the end when no context is defined at all. (Default: false
)documentLoader
A custom loader for fetching remote contexts. This can be set to anything that implements IDocumentLoader
(Default: FetchDocumentLoader
)produceGeneralizedRdf
: If blank node predicates should be allowed, they will be ignored otherwise. (Default: false
)processingMode
: The maximum JSON-LD version that should be processable by this parser. (Default: 1.0
)errorOnInvalidIris
: By default, JSON-LD requires that all properties (or @id's) that are not URIs, are unknown keywords, and do not occur in the context should be silently dropped. When setting this value to true, an error will be thrown when such properties occur. This is useful for debugging JSON-LD documents. (Default: false
)allowSubjectList
: If RDF lists can appear in the subject position. (Default: false
)validateValueIndexes
: If @index inside array nodes should be validated. I.e., nodes inside the same array with the same @id, should have equal @index values. This is not applicable to this parser as we don't do explicit flattening, but it is required to be spec-compliant. (Default: false
)defaultGraph
: The default graph for constructing quads. (Default: defaultGraph()
)new JsonLdParser({
dataFactory: require('@rdfjs/data-model'),
context: 'https://schema.org/',
baseIRI: 'http://example.org/',
allowOutOfOrderContext: false,
documentLoader: new FetchDocumentLoader(),
produceGeneralizedRdf: false,
processingMode: '1.0',
errorOnInvalidIris: false,
allowSubjectList: false,
validateValueIndexes: false,
defaultGraph: namedNode('http://example.org/graph'),
});
This parser does not follow the recommended procedure for transforming JSON-LD to RDF, because this does not allow stream-based handling of JSON. Instead, this tool introduces an alternative streaming algorithm that achieves spec-compliant JSON-LD parsing.
This parser builds on top of the jsonparse library, which is a sax-based streaming JSON parser. With this, several in-memory stacks are maintained. These stacks are needed to accumulate the required information to emit triples/quads. These stacks are deleted from the moment they are not needed anymore, to limit memory usage.
The algorithm makes a couple of (soft) assumptions regarding the structure of the JSON-LD document, which is true for most typical JSON-LD documents.
@context
, it is the first entry of an object.@id
, it comes right after @context
, or is the first entry of an object.If these assumptions are met, (almost) each object entry corresponds to a triple/quad that can be emitted. For example, the following document allows a triple to be emitted after each object entry (except for first two lines):
{
"@context": "http://schema.org/",
"@id": "http://example.org/",
"@type": "Person", // --> <http://example.org/> a schema:Person.
"name": "Jane Doe", // --> <http://example.org/> schema:name "Jane Doe".
"jobTitle": "Professor", // --> <http://example.org/> schema:jobTitle "Professor".
"telephone": "(425) 123-4567", // --> <http://example.org/> schema:telephone "(425) 123-4567".
"url": "http://www.janedoe.com" // --> <http://example.org/> schema:url <http://www.janedoe.com>.
}
If not all of these assumptions are met, entries of an object are buffered until enough information becomes available, or if the object is closed.
For example, if no @id
was present, values will be buffered until an @id
is read, or if the object closed.
For example:
{
"@context": "http://schema.org/",
"@type": "Person",
"name": "Jane Doe",
"jobTitle": "Professor",
"@id": "http://example.org/", // --> <http://example.org/> a schema:Person.
// --> <http://example.org/> schema:name "Jane Doe".
// --> <http://example.org/> schema:jobTitle "Professor".
"telephone": "(425) 123-4567", // --> <http://example.org/> schema:telephone "(425) 123-4567".
"url": "http://www.janedoe.com" // --> <http://example.org/> schema:url <http://www.janedoe.com>.
}
As such, JSON-LD documents that meet these requirements will be parsed very efficiently. Other documents will still be parsed correctly as well, with a slightly lower efficiency.
By default, this parser is not 100% spec-compliant. The main reason for this being the fact that this is a streaming parser, and some edge-cases are really inefficient with the streaming-nature of this parser.
However, by changing a couple of settings, it can easily be made fully spec-compliant. The downside of this is that the whole document will essentially be loaded in memory before results are emitted, which will void the main benefit of this parser.
const mySpecCompliantParser = new JsonLdParser({
allowOutOfOrderContext: true,
validateValueIndexes: true,
});
Concretely, this parser implements the following JSON-LD specifications:
The following table shows some simple performance comparisons between JSON-LD Streaming Parser and jsonld.js.
These basic experiments show that even though streaming parsers are typically significantly slower than regular parsers, JSON-LD Streaming Parser still achieves similar performance as jsonld.js for most typical JSON-LD files. However, for expanded JSON-LD documents, JSON-LD Streaming Parser is around 3~4 times slower.
File | JSON-LD Streaming Parser | jsonld.js |
---|---|---|
toRdf-manifest.jsonld (999 triples) | 683.964ms (38MB) | 708.975ms (40MB) |
sparql-init.json (69 triples) | 931.698ms (40MB) | 1088.607ms (47MB) |
person.json (5 triples) | 309.419ms (30MB) | 313.138ms (41MB) |
dbpedia-10000-expanded.json (10,000 triples) | 785.557ms (70MB) | 202.363ms (62MB) |
Tested files:
toRdf-manifest.jsonld
: The JSON-LD toRdf test manifest. A typical JSON-LD file with a single context.sparql-init.json
: A Comunica configuration file. A JSON-LD file with a large number of complex, nested, and remote contexts.person.jsonld
: A very small JSON-LD example from the JSON-LD playground.dbpedia-10000-expanded.json
First 10000 triples of DBpedia in expanded JSON-LD.This software is written by Ruben Taelman.
This code is released under the MIT license.
FAQs
A fast and lightweight streaming JSON-LD parser
The npm package jsonld-streaming-parser receives a total of 32,933 weekly downloads. As such, jsonld-streaming-parser popularity was classified as popular.
We found that jsonld-streaming-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
Security News
A Stanford study reveals 9.5% of engineers contribute almost nothing, costing tech $90B annually, with remote work fueling the rise of "ghost engineers."
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.