RDF Dereference

This library dereferences URLs to get its RDF contents.
This tool is useful in situations where you have a URL,
and you just need the parsed triples/quads,
without having to concern yourself with determining the correct content type and picking the correct parser.
RDF contents are returned as an RDF stream with RDFJS-compliant quads.
This library takes care of all the necessary boilerplate automatically,
such as content negotiation for getting appropriate RDF serialization, decompression, following redirects, setting base URLs, and so on.
If the server did not emit any content type, then the content type will be guessed based on well-known extensions.
The following RDF serializations are supported:
Name | Content type | Extensions |
---|
TriG | application/trig | .trig |
N-Quads | application/n-quads | .nq , .nquads |
Turtle | text/turtle | .ttl , .turtle |
N-Triples | application/n-triples | .nt , .ntriples |
Notation3 | text/n3 | .n3 |
JSON-LD | application/ld+json , application/json | .json , .jsonld |
RDF/XML | application/rdf+xml | .rdf , .rdfxml , .owl |
RDFa and script RDF data tags HTML/XHTML | text/html , application/xhtml+xml | .html , .htm , .xhtml , .xht |
RDFa in SVG/XML | image/svg+xml ,application/xml | .xml , .svg , .svgz |
Internally, this library makes use of RDF parsers from the Comunica framework,
which enable streaming processing of RDF.
If you need something more low-level with more control, have a look at rdf-parse
.
Installation
$ npm install rdf-dereference
or
$ yarn add rdf-dereference
This package also works out-of-the-box in browsers via tools such as webpack and browserify.
Require
import rdfDereferencer from "rdf-dereference";
or
const rdfDereferencer = require("rdf-dereference").default;
Usage
Dereferencing an RDF document
The rdfDereferencer.dereference
method accepts an URL,
and outputs a promise resolving to an object containing a quad stream.
const { quads } = await rdfDereferencer.dereference('http://dbpedia.org/page/12_Monkeys');
quads.on('data', (quad) => console.log(quad))
.on('error', (error) => console.error(error))
.on('end', () => console.log('All done!'));
Such a stream is useful when the RDF document is huge,
and you want to process it in a memory-efficient way.
Dereferencing works with any kind of RDF serialization,
even HTML documents containing RDFa and JSON-LD:
const { quads1 } = await rdfDereferencer.dereference('https://www.rubensworks.net/');
const { quads2 } = await rdfDereferencer.dereference('https://www.netflix.com/title/80180182');
Importing the resulting quads into store
These resulting quads can easily be stored in a more convenient datastructure
using tools such as rdf-store-stream
:
import {storeStream} from "rdf-store-stream";
const store = await storeStream(quads);
const resultStream = store.match(namedNode('http://example.org/subject'));
Advanced features
Determining the final URL
If dereferencing went through various redirects, it may be useful to determine the final URL.
This can be done using the url
field of the output object:
const { quads, url } = await rdfDereferencer.dereference('https://www.netflix.com/title/80180182');
console.log(url);
Triples or Quads
Some RDF serializations don't support named graphs, such as Turtle and N-Triples.
In some cases, it may be valuable to know whether or not an RDF document was serialized with such a format.
If this was the case, the triples
flag will be set to true on the resulting object:
const { quads, triples } = await rdfDereferencer.dereference('https://ruben.verborgh.org/profile/');
console.log(triples);
License
This software is written by Ruben Taelman.
This code is released under the MIT license.