RDF Canonicalization in TypeScript
This is an implementation of the RDF Dataset Canonicalization algorithm, also referred to as RDFC-1.0. The algorithm has been published by the W3C RDF Dataset Canonicalization and Hash Working Group.
Requirements
RDF packages and references
The implementation depends on the interfaces defined by the RDF/JS Data model specification for RDF terms, named and blank nodes, or quads. It also depends on an instance of an RDF Data Factory, specified by the aforementioned specification. For TypeScript, the necessary type specifications are available through the @rdfjs/types
package; an implementation of the RDF Data Factory is provided by, for example, the n3
package (but there are others), which also provides a Turtle/TriG parser and serializer to test the library.
By default (i.e., if not explicitly specified) the Data Factory of the n3
package is used.
Crypto
The implementation relies on the Web Cryptography API as implemented by modern browsers, deno
(version 1.3.82 or higher), or node.js
(version 21 or higher). A side effect of using Web Crypto is that the canonicalization and hashing interface entries are all asynchronous, and must be used, for example, through the await
idiom of Javascript/Typescript.
Usage
An input RDF Dataset may be represented by any object that may be iterated through Quad instances (e.g., arrays of Quads, a Set of Quads, or any specialized objects around Quads), or a string representing an N-Quads, Turtle, or TriG document. Formally, the input can be:
Iterable<rdf.Quad> | string
Note that it is expected, but not checked, that the Iterable<rdf.Qad>
instance does not have repeated Quads. If the input is Turtle, N-Quads etc, document that is parsed by the system, duplicate quads are filtered out.
The canonicalization process can be invoked by
- the
canonicalize
method, that returns an N-Quads document containing the (sorted) quads of the dataset, and using the canonical blank node ids - the
canonicalizeDetailed
method, that returns an Object of the form:
canonicalized_dataset
: a Set of Quad instances, using the canonical blank node idscanonical_form
: an N-Quads document containing the (sorted) quads of the dataset, using the canonical blank node idsissued_identifier_map
: a Map
object, mapping the original blank node ids (as used in the input) to their canonical equivalentsbnode_identifier_map
: Map
object, mapping a blank node to its (canonical) blank node id
The separate testing folder includes a tiny application that runs some specification tests, and can be used as an example for the additional packages that are required.
Installation
For node.js
, the usual npm
installation can be used:
npm rdfjs-c14n
The package has been written in TypeScript but is distributed in JavaScript; the type definition (i.e., index.d.ts
) is included in the distribution.
Also, using appropriate tools (e.g., esbuild) the package can be included into a module that can be loaded into a browser.
For deno
a simple
import { RDFC10, Quads, InputQuads } from "npm:rdfjs-c14n"
will do.
Usage Examples
There is a more detailed documentation of the classes and types on github. The basic usage may be as follows:
import * as n3 from 'n3';
import * as rdf from '@rdfjs/types';;
import {RDFC10, Quads, InputQuads } from 'rdf-c14n';
async main() {
const rdfc10 = new RDFC10(n3.DataFactory);
const input: InputQuads = createYourQuads();
const normalized: Quads = (await rdfc10.c14n(input)).canonicalized_dataset;
const normalized_N_Quads: string = (await rdfc10.c14n(input)).canonical_form;
const normalized_N_Quads_bis: string = await rdfc10.canonicalize(input);
const hash: string = await rdfc10.hash(normalized);
}
Alternatively, the canonicalization can rely on N-Quads documents only, with all other details hidden:
import * as n3 from 'n3';
import * as rdf from '@rdfjs/types';;
import {RDFC10, Quads, InputQuads, quadsToNquads } from 'rdf-c14n';
main() {
const rdfc10 = new RDFC10();
const input: string = fetchYourNQuadsDocument();
const normalized: string = await rdfc10.canonicalize(input);
const hash = await rdfc10.hash(normalized);
}
Additional features
Choice of hash
The RDFC 1.0 algorithm is based on an extensive usage of hashing. By default, as specified by the specification, the hash function is 'sha256'.
This default hash function can be changed via the
rdfc10.hash_algorithm = algorithm;
attribute, where algorithm
can be any hash function identification. Examples are 'sha256', 'sha512', etc. The list of available hash algorithms can be retrieved as:
rdfc10.available_hash_algorithms;
which corresponds to the values defined by, and also usually implemented, the Web Cryptography API specification (as of December 2013),
namely 'sha1', 'sha256', 'sha384', and 'sha512'.
Controlling the complexity level
On rare occasion, the RDFC 1.0 algorithm has to go through complex
cycles that may also involve a recursive steps. On even more extreme situations, the running of the algorithm could result in an unreasonably long canonicalization process. Although this practically never occurs in practice, attackers may use some "poison graphs" to create such situations (see the security consideration section in the specification).
This implementation sets a maximum complexity level (usually set to 50); this level can be accessed by the
rdfc10.maximum_allowed_complexity_number;
(read-only) attribute. This number can be lowered by setting the
rdfc10.maximum_complexity_number
attribute. The value of this attribute cannot exceed the system wide maximum level.
Logging
The canonicalization algorithm has built-in logging points that can be followed via a logger. This is only of interest for debugging the algorithm itself; it can be safely ignored by the average user. By default, no logging happens.
A built-in logger can be switched on which displays logging information in YAML. To use this YAML logger, do the following:
import { LogLevels } from 'rdfjs-c14n';
…
main() {
…
const rdfc10 = new RDFC10();
const logger = rdfc10.setLogger("YamlLogger", logLevel);
…
console.log(logger.log);
}
Implementers may add their own loggers to the system by implementing a new Logger instance. See the interface specification for Logger to possibly implement your own logger, and the general documentation on how to add this logger to the list of available loggers. In case there are more loggers, the list of available loggers is also available to the end user via:
rdfc10.available_logger_types;
that returns the loggers that are included in the distribution.
Configurations
The default complexity value and the hash algorithm are both set in the code, see the configuration module.
Specific applications may want to add the possibility to let the user configure these values, e.g., via environment variables or configuration files. This requires specific features (e.g., file access) depending on the platform used to run the algorithm (e.g., node.js, deno, or a browser platform), i.e., this requires some extra code that should not be included in the library. However, the library is prepared to run such an external configuration setting via a callback when constructing the RDFC10 instance, as follows:
…
const rdfc10 = new RDFC10(null, getConfigData);
…
where null
stands for a possible DataFactory
instance (or null
if the default is used) and getConfigData
stands for a callback returning the configuration data. An example callback (using a combination of environment variables and configuration files and relying on the node.js
platform) is available, and can be easily adapted to other platforms (e.g., deno
). (A javascript version of the callback is also available.)
Maintainer: @iherman