RDF Canonicalization in TypeScript
This is an implementation of the RDF Dataset Canonicalization algorithm, also referred to as RDFC-1.0. (The algorithm is being specified by the W3C RDF Dataset Canonicalization and Hash Working Group.)
The specification is not yet final. This implementations aims at reflecting exactly the specification, which means it may evolve alongside the specification even if changes are editorial only.
Requirements
The implementation depends on the interfaces defined by the RDF/JS Data model specification for RDF terms, named and blank nodes, or quads. It also depends on an instance of an RDF Data Factory, specified by the aforementioned specification. For TypeScript, the necessary type specifications are available through the @rdfjs/types
package; an implementation of the RDF Data Factory is provided by, for example, the n3
package (but there are others), which also provides a Turtle/TriG parser and serializer to test the library.
By default (i.e., if not explicitly specified) the Data Factory of the n3
package is used.
An input RDF Dataset may be represented by:
The canonicalization process can be invoked by
-
the canonicalize
method, that returns an N-Quads document containing the (sorted) quads of the dataset, and using the canonical blank node ids
-
the canonicalizeDetailed
method, that returns an Object of the form:
canonicalized_dataset
: a Set or Array of Quad instances, using the canonical blank node idscanonical_form
: an N-Quads document containing the (sorted) quads of the dataset, using the canonical blank node idsissued_identifier_map
: a Map
object, mapping the original blank node ids (as used in the input) to their canonical equivalentsbnode_identifier_map
: Map
object, mapping a blank node to its (canonical) blank node id
-
A Set or an Array of Quad instances, if the input was a Set or an Array, respectively;
-
A Set of Quad instances if the input was an N-Quads document.
The separate testing folder includes a tiny application that runs the official specification tests, and can be used as an example for the additional packages that are required.
Installation
The usual npm
installation can be used:
npm rdfjs-c14n
The package has been written in TypeScript but is distributed in JavaScript; the type definition (i.e., index.d.ts
) is included in the distribution.
Also, using appropriate tools (e.g., esbuild) the package can be included into a module that can be loaded into a browser.
Usage
There is a more detailed documentation of the classes and types on github. The basic usage may be as follows:
import * as n3 from 'n3';
import * as rdf from 'rdf-js';
import {RDFC10, Quads } from 'rdf-c14n';
main() {
const rdfc10 = new RDFC10(n3.DataFactory);
const input: Quads = createYourQuads();
const normalized: Quads = rdfc10.c14n(input).canonicalized_dataset;
const normalized_N_Quads: string = rdfc10.c14n(input).canonical_form;
const normalized_N_Quads_bis: string = rdfc10.canonicalize(input);
const hash: string = rdfc10.hash(normalized);
}
Alternatively, the canonicalization can rely on N-Quads documents only, with all other details hidden:
import * as n3 from 'n3';
import * as rdf from 'rdf-js';
import {RDFC10, Quads, quadsToNquads } from 'rdf-c14n';
main() {
const rdfc10 = new RDFC10();
const input: string = fetchYourNQuadsDocument();
const normalized: string = rdfc10.canonicalize(input);
const hash = rdfc10.hash(normalized);
}
Additional features
Choice of hash
The RDFC 1.0 algorithm is based on an extensive usage of hashing. By default, as specified by the document, the hash function is 'sha256'. This default hash function can be changed via the
rdfc10.hash_algorithm = algorithm;
attribute, where algorithm
can be any hash function identification. Examples are 'sha256', 'sha512', etc. The list of available hash algorithms can be retrieved as:
rdfc10.available_hash_algorithms;
which corresponds to any value that the underlying npm/crypto-js
package (version 4.1.1., as of July 2023) accepts.
Controlling the complexity level
On rare occasion, the RDFC 1.0 algorithm has to go through complex
cycles that may also involve a recursive steps. On even more extreme situations, the running of the algorithm could result in an unreasonably long canonicalization process. Although this practically never occurs in practice, attackers may use some "poison graphs" to create such situations (see the security consideration section in the specification).
This implementation sets a maximum level; this level can be accessed by the
rdfc10.maximum_allowed_complexity_number;
(read-only) attribute. This number can be lowered by setting the
rdfc10.maximum_complexity_number
attribute. The value of this attribute cannot exceed the system wide maximum level.
Logging
The canonicalization algorithm has built-in logging points that can be followed via a logger. This is only of interest for debugging the algorithm itself; it can be safely ignored by the average user. By default, no logging happens.
A built-in logger can be switched on which displays logging information in YAML. To use this YAML logger, do the following:
import { LogLevels } from 'rdfjs-c14n';
…
main() {
…
const rdfc10 = new RDFC10();
const logger = rdfc10.setLogger("YamlLogger", logLevel);
…
console.log(logger.log);
}
Implementers may add their own loggers to the system by implementing a new Logger instance. See the interface specification for Logger to possibly implement your own logger, and the general documentation on how to add this logger to the list of available loggers. In case there are more loggers, the list of available loggers is also available to the end user via:
rdfc10.available_logger_types;
that returns the loggers that are included in the distribution.
Configurations
The default complexity value and the hash algorithm are both set in the code, see the configuration module.
Specific applications may want to add the possibility to let the user configure these values, e.g., via environment variables or configuration files. This requires specific features (e.g., file access) depending on the platform used to run the algorithm (e.g., node.js, deno, or a browser platform), i.e., this requires some extra code that should not be included in the library. However, the library is prepared to run such an external configuration setting via a callback when constructing the RDFC10 instance, as follows:
…
const rdfc10 = new RDFC10(null, getConfigData);
…
where null
stands for a possible DataFactory
instance (or null
if the default is used) and getConfigData
stands for a callback returning the configuration data. An example callback (using a combination of environment variables and configuration files and relying on the node.js platform) is available, and can be easily adapted to other platforms (e.g., deno). (A javascript version of the callback is also available.)
Maintainer: @iherman