
Security News
The Changelog Podcast: Practical Steps to Stay Safe on npm
Learn the essential steps every developer should take to stay secure on npm and reduce exposure to supply chain attacks.
ckanext-versioned-datastore
Advanced tools
A CKAN extension providing a versioned datastore using MongoDB and Elasticsearch
A CKAN extension providing a versioned datastore using MongoDB and Elasticsearch
This plugin provides a complete replacement for ckan's datastore plugin and therefore shouldn't be used in conjunction with it. Rather than storing data in PostgreSQL, resource data is stored in MongoDB and then made available to frontend APIs using Elasticsearch.
This allows this plugin to:
This plugin is built on Splitgill.
Path variables used below:
$INSTALL_FOLDER (i.e. where CKAN is installed), e.g. /usr/lib/ckan/default$CONFIG_FILE, e.g. /etc/ckan/default/development.inipip install ckanext-versioned-datastore
Clone the repository into the src folder:
cd $INSTALL_FOLDER/src
git clone https://github.com/NaturalHistoryMuseum/ckanext-versioned-datastore.git
Activate the virtual env:
. $INSTALL_FOLDER/bin/activate
Install via pip:
pip install $INSTALL_FOLDER/src/ckanext-versioned-datastore
Installing from a pyproject.toml in editable mode (i.e. pip install -e) requires setuptools>=64; however, CKAN 2.9 requires setuptools==44.1.0. See our CKAN fork for a version of v2.9 that uses an updated setuptools if this functionality is something you need.
Add 'versioned_datastore' to the list of plugins in your $CONFIG_FILE:
ckan.plugins = ... versioned_datastore
Install lessc globally:
npm install -g "less@~4.1"
At the version of splitgill this plugin uses, you will also need to install:
This plugin also requires CKAN's job queue, which is included in recent versions of CKAN or can be added to older versions using the ckanext-rq plugin.
There are a number of options that can be specified in your .ini config file. All configuration options are currently required.
| Name | Description | Example |
|---|---|---|
ckanext.versioned_datastore.elasticsearch_hosts | A comma separated list of elasticsearch server hosts | 1.2.3.4,1.5.4.3,es.mydomain.local |
ckanext.versioned_datastore.elasticsearch_port | The port for to use for the elasticsearch server hosts listed in the elasticsearch_hosts option | 9200 |
ckanext.versioned_datastore.elasticsearch_index_prefix | The prefix to use for index names in elasticsearch. Each resource in the datastore gets an index and the name of the index is the resource ID with this prefix prepended. | nhm- |
ckanext.versioned_datastore.mongo_host | The mongo server host | 10.54.24.10 |
ckanext.versioned_datastore.mongo_port | The port to use to connect to the mongo host | 27017 |
ckanext.versioned_datastore.mongo_database | The name of the mongo database to use to store datastore data in | nhm |
| Name | Description | Example |
|---|---|---|
ckanext.versioned_datastore.redis_host | The redis server host. If this is provided slugging is enabled | 14.1.214.50 |
ckanext.versioned_datastore.redis_port | The port to use to connect to the redis host | 6379 |
ckanext.versioned_datastore.redis_database | The redis database index to use to store datastore multisearch slugs in | 1 |
ckanext.versioned_datastore.slug_ttl | The amount of time slugs should last for, in days. Default: 7 | 7 |
ckanext.versioned_datastore.dwc_core_extension_name | The name of the DwC core extension to use, as defined in dwc/writer.py. | gbif_occurrence |
ckanext.versioned_datastore.dwc_extension_names | A comma-separated list of (non-core) DwC extension names, as defined in dwc/writer.py. | gbif_multimedia |
ckanext.versioned_datastore.dwc_org_name | The organisation name to use in DwC-A metadata. Default: the value of ckanext.doi.publisher or ckan.site_title | The Natural History Museum |
ckanext.versioned_datastore.dwc_org_email | The contact email to use in DwC-A metadata. Default: the value of smtp.mail_from | contact@yoursite.com |
ckanext.versioned_datastore.dwc_default_license | The license to use in DwC-A metadata if the resources have differing licenses or no license is specified. Default: null | http://creativecommons.org/publicdomain/zero/1.0/legalcode |
A brief tour!
The plugin automatically detects resources on upload that can be added to the datastore. This is accomplished using the resource format. Currently the accepted formats are:
If one of these formats is used then an attempt will be made to add the uploaded or URL to the datastore. Note that only the first sheet in multisheet XLS and XLSX files will be processed.
Adding data to the datastore is accomplished in two steps:
The ingesting and indexing is completed in the background using the CKAN's job queue.
Once data has been added to the datastore it can be searched using the datastore_search or more advanced datastore_search_raw actions.
The datastore_search action closely mirrors the default CKAN datastore action of the same name.
The datastore_search_raw action allows users to query the datastore using raw Elasticsearch queries, unlocking the full range of features it provides.
All of this extension's actions are fully documented, including all parameters and results.
vdsinitdb: ensure the tables needed by this plugin exist.
ckan -c $CONFIG_FILE initdb
reindex: reindex either a specific resource or all resources.
ckan -c $CONFIG_FILE reindex $OPTIONAL_RESOURCE_ID
IVersionedDatastoreThis is the most general interface.
Here is a brief overview of its functions:
datastore_modify_data_dict - allows modification of the data dict before it is validated and used to create the search objectdatastore_modify_search - allows modifications to the search before it is made. This is kind of analogous to IDatastore.datastore_search however instead of passing around a query dict, instead an elasticsearch-dsl Search object is passed arounddatastore_modify_result - allows modifications to the result after the searchdatastore_modify_fields - allows modification of the field definitions before they are returned with the results of a datastore_searchdatastore_modify_index_doc - allows the modification of a resource's data during indexingdatastore_is_read_only_resource - allows implementors to designate certain resources as read onlydatastore_after_indexing - allows implementors to hook onto the completion of an indexing taskSee the interface definition in this plugin for more details about these functions.
IVersionedDatastoreQueryThis interface handles hooks and functions specifically relating to search queries.
get_query_schemas - allows registering custom query schemasIVersionedDatastoreDownloadsThis interface handles hooks and functions specifically relating to downloads.
download_modify_notifier_start_templates - modify the templates used when sending notifications that a download has starteddownload_modify_notifier_end_templates - modify the templates used when sending notifications that a download has endeddownload_modify_notifier_error_templates - modify the templates used when sending notifications that a download has faileddownload_modify_notifier_template_context - modify the context/variables used to populate the notification templatesdownload_derivative_generators - extend or modify the list of derivative generatorsdownload_file_servers - extend or modify the list of file serversdownload_notifiers - extend or modify the list of notifiersdownload_data_transformations - extend or modify the list of data transformationsdownload_modify_manifest - modify the manifest included in the download filedownload_before_run - modify args before any search is run or files generateddownload_after_run - hook notifying that a download has finished (whether failed or completed)There is a Docker compose configuration available in this repository to make it easier to run tests. The ckan image uses the Dockerfile in the docker/ folder.
To run the tests against ckan 2.9.x on Python3:
Build the required images:
docker compose build
Then run the tests. The root of the repository is mounted into the ckan container as a volume by the Docker compose configuration, so you should only need to rebuild the ckan image if you change the extension's dependencies.
docker compose run ckan
FAQs
A CKAN extension providing a versioned datastore using MongoDB and Elasticsearch
We found that ckanext-versioned-datastore demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Learn the essential steps every developer should take to stay secure on npm and reduce exposure to supply chain attacks.

Security News
Experts push back on new claims about AI-driven ransomware, warning that hype and sponsored research are distorting how the threat is understood.

Security News
Ruby's creator Matz assumes control of RubyGems and Bundler repositories while former maintainers agree to step back and transfer all rights to end the dispute.