RDF Canonicalization in TypeScript
This is an implementation of the RDF Dataset Canonicalization algorithm, also referred to as RDFC-1.0. (The algorithm is being specified by the W3C RDF Dataset Canonicalization and Hash Working Group.)
The specification is not yet final. This implementations aims at reflecting exactly the specification, which means it may evolve alongside the specification even if changes are editorial only.
Requirements
The implementation depends on the interfaces defined by the RDF/JS Data model specification for RDF terms, named and blank nodes, or quads. It also depends on an instance of an RDF Data Factory, specified by the aforementioned specification. For TypeScript, the necessary type specifications are available through the @rdfjs/types
package; an implementation of the RDF Data Factory is provided by, for example, the n3
package (but there are others), which also provides a Turtle/TriG parser and serializer to test the library.
By default (i.e., if not explicitly specified) the Data Factory of the n3
package is used.
An input RDF Dataset may be represented by:
The canonicalization process can be invoked by
-
the canonicalize
method, that returns an N-Quads document containing the (sorted) quads of the dataset, and using the canonical blank node ids
-
the canonicalizeDetailed
method, that returns an Object of the form:
dataset
: a Set or Array of Quad instances, using the canonical blank node idsdataset_nquad
: an N-Quads document containing the (sorted) quads of the dataset, and using the canonical blank node idsbnode_id_map
: a Map
object, mapping the original blank node ids (as used in the input) to their canonical equivalents
-
A Set or an Array of Quad instances, if the input was a Set or an Array, respectively;
-
A Set of Quad instances if the input was an N-Quads document.
The separate testing folder includes a tiny application that runs the official specification tests, and can be used as an example for the additional packages that are required.
Installation
The usual npm
installation can be used:
npm rdfjs-c14n
The package has been written in TypeScript but is distributed in JavaScript; the type definition (i.e., index.d.ts
) is included in the distribution.
Usage
There is a more detailed documentation of the classes and types on github. The basic usage may be as follows:
import * as n3 from 'n3';
import * as rdf from 'rdf-js';
import {RDFC10, Quads } from 'rdf-c14n';
main() {
const rdfc10 = new RDFC10(n3.DataFactory);
const input: Quads = createYourQuads();
const normalized: Quads = rdfc10.canonicalizeDetailed(input).dataset;
const normalized_N_Quads: string = rdfc10.canonicalizeDetailed(input).dataset_nquad;
const normalized_N_Quads_bis: string = rdfc10.canonicalize(input);
const hash: string = rdfc10.hash(normalized);
}
Alternatively, the canonicalization can rely on N-Quads documents only, with all other details hidden:
import * as n3 from 'n3';
import * as rdf from 'rdf-js';
import {RDFC10, Quads, quadsToNquads } from 'rdf-c14n';
main() {
const rdfc10 = new RDFC10();
const input: string = fetchYourNQuadsDocument();
const normalized: string = rdfc10.canonicalize(input);
const hash = rdfc10.hash(normalized);
}
Additional features
Choice of hash
The RDFC 1.0 algorithm is based on an extensive usage of hashing. By default, as specified by the document, the hash function is 'sha256'. This default hash function can be changed via the
rdfc10.hash_algorithm = algorithm;
attribute, where algorithm
can be any hash function identification. Examples are 'sha256', 'sha512', etc. The list of available hash algorithms can be retrieved as:
rdfc10.available_hash_algorithms;
which corresponds to what the underlying OpenSSL library of node.js
implements (as of June 2023, i.e., version 18.16.0).
Controlling the recursion level
On rare occasion, the RDFC 1.0 algorithm has to go through some recursive steps. On even more extreme situations, the running of the algorithm could result in an unreasonably long canonicalization process. Although this practically never occurs in practice, attackers may use some "poison graphs" to create such a situation (see the security consideration section in the specification).
This implementation sets a maximum level; this level can be accessed by the
rdfc10.maximum_allowed_recursion_level;
(read-only) attribute. This number can be lowered by setting the
rdfc10.maximum_recursion_level
attribute. The value of this attribute cannot exceed the system wide maximum allowed level.
Logging
The canonicalization algorithm has built-in logging points that can be followed via a logger. This is only of interest for debugging the algorithm itself; it can be safely ignored by the average user. By default, no logging happens.
A built-in logger can be switched on which displays logging information in YAML. To use this YAML logger, do the following:
import { LogLevels } from 'rdfjs-c14n';
…
main() {
…
const rdfc10 = new RDFC10();
const logger = rdfc10.setLogger("YamlLogger", logLevel);
…
console.log(logger.log);
}
Implementers may add their own loggers to the system by implementing a new Logger instance. See the interface specification for Logger to possibly implement your own logger, and the general documentation on how to add this logger to the list of available loggers. In case there are more loggers, the list of available loggers is also available to the end user via:
rdfc10.available_logger_types;
Maintainer: @iherman