EntrySync - library for synchronizing entries with metadata from various sources
The EntryScape platform with the backend EntryStore relies on the use of entries that may contain both a resource, metadata and external metadata. At the heart of the library is the mechanism that synchronizes metadata of entries by doing metadata fingerprinting.
Installation
Dependencies are installed by running yarn
.
Synchronization patterns
The core functionality of the library lets you build a custom synchronization mechanism. However, most cases can be covered with the following established synchronization patterns. The patterns are listened below, each together with a corresponding CLI command.
Graph synchronization pattern - src/graph/graphSync.js
This pattern takes a single graph as input and breaks it up into smaller graphs centered around entities and synchronizes them as entries.
The algorithm for breaking up the graph is based on detecting entities based on rdf:type
and includes all outgoing triples and then repeats the procedure for all blank nodes in object position.
CLI command:
cd cli
node graphSync.js config.js
Where config.js has to be provided, check the example cli/graphSync_exampleConfig.js
Type based synchronization pattern - src/context/typeSync.js
This pattern synchronizes entries in one context with another context (potentially in another EntryStore instance). Detection of entries is based on one or several classes (rdf:type
).
CLI command:
cd cli
node typeSync.js config.js
Where config.js has to be provided, check the example cli/typeSync_exampleConfig.js
Traversal synchronization pattern - src/context/traverseSync.js
This pattern synchronizes entries in one context with another context (potentially in another EntryStore instance). Detection of entries is based on an initial starting point of one or several entries and includes all entries reachable via a set of properties.
CLI command:
cd cli
node traverseSync.js config.js
Where config.js has to be provided, check the example cli/traverseSync_exampleConfig.js
Core functionality
The following classes are central to how the synchronization works:
src/EntrySync.js This class handles synchronizing metadata as Entries in an EntryStore instance, uses EntityIndex and DuplicateIndex to steer what should be synchronized.
src/EntityIndex.js This class handles an index of synchronized entries with the corresponding metadata fingerprint, useful to speed up conseqitive synchronizations, can be persisted on disk.
src/DuplicateIndex.js Keeps track of which entities that have already been synched and blocks them from being duplicated.