RDF Canonicalization in TypeScript
This is an implementation (under development) of the RDF Dataset Canonicalization algorithm, also referred to as URDNA2015. (The algorithm is being specified by the W3C RDF Dataset Canonicalization and Hash Working Group.)
The specification is not yet final. This implementations aims at reflecting exactly the specification, which means it may evolve alongside the specification even if changes are editorial only.
Requirements
The implementation depends on the interfaces defined by the RDF/JS Data model specification for RDF terms, named and blank nodes, or quads. It also depends on an instance of an RDF Data Factory, specified by the aforementioned specification. For TypeScript, the necessary type specifications are available through the @rdfjs/types
package; an implementation of the RDF Data Factory is provided by, for example, the n3
package (but there are others), which also provides a Turtle/TriG parser and serializer to test the library.
By default (i.e., if not explicitly specified) the Data Factory of the n3
package is used.
An input RDF Dataset may be represented by:
The canonicalization process returns
- A Set or an Array of Quad instances, if the input was a Set or an Array, respectively;
- An instance of a Core Dataset if the input was a Core Dataset and the canonicalization is initialized with an instance of the Dataset Core Factory; otherwise it is a Set of Quad instances.
The separate testing folder includes a tiny application that runs the official specification tests, and can be used as an example for the additional packages that are required.
Installation
The usual npm
installation can be used:
npm rdfjs-c14n
The package has been written in TypeScript but is distributed in JavaScript; the type definition (i.e., index.d.ts
) is included in the distribution.
Usage
There is a more detailed documentation of the classes and types on github. The basic usage may be as follows:
import * as n3 from 'n3';
import * as rdf from 'rdf-js';
import {RDFCanon, Quads, quadsToNquads } from 'rdf-c14n';
main() {
const canonicalizer = new RDFCanon(n3.DataFactory);
const input = parseYourFavoriteTriGIntoQuads();
const normalized: Quads = canonicalizer.canonicalize(input)
const hash = canonicalizer.hash(normalized);
const nquads = canonicalizer.toNquads(normalized);
}
Additional features
Choice of hash
The RDF Dataset Canonicalization algorithm is based on an extensive usage of hashing. By default, as specified by the document, the hash function is 'sha256'. This default hash function can be changed via the
canonicalizer.setHashAlgorithm(algorithm);
method, where algorithm
can be any hash function identification that the underlying openssl environment (as used by node.js
) accepts. Examples are 'sha256', 'sha512', etc. On recent releases of OpenSSL, openssl list -digest-algorithms
will display the available algorithms.
Logging
The canonicalization algorithm has built-in logging points that can be followed via a logger. This is only of interest for debugging the algorithm itself; it can be safely ignored by the average user. By default, no logging happens.
A built-in logger can be switched on which displays logging information in YAML. To use this YAML logger, do the following:
import { YamlLogger, LogLevels } from 'rdfjs-c14n';
main() {
…
const canonicalizer = new RDFCanon();
const logger = new YamlLogger(logLevel);
canonicalizer.setLogger(logger);
…
console.log(logger.log);
}
See the interface specification for Logger to possibly implement your own logger.
Maintainer: @iherman