@sanity/import
Imports documents from an ndjson-stream to a Sanity dataset
Installing
npm install --save @sanity/import
Usage
const fs = require('fs')
const sanityClient = require('@sanity/client')
const sanityImport = require('@sanity/import')
const client = sanityClient({
projectId: '<your project id>',
dataset: '<your target dataset>',
token: '<token-with-write-perms>',
useCdn: false,
})
const input = fs.createReadStream('my-documents.ndjson')
const options = {
client: client,
operation: 'create',
onProgress: (progress) => {
},
allowAssetsInDifferentDataset: false,
allowFailingAssets: false,
replaceAssets: false,
skipCrossDatasetReferences: false,
allowSystemDocuments: false,
}
sanityImport(input, options)
.then(({numDocs, warnings}) => {
console.log('Imported %d documents', numDocs)
})
.catch((err) => {
console.error('Import failed: %s', err.message)
})
CLI-tool
This functionality is built in to the sanity
package as sanity dataset import
, but is also usable through the sanity-import
CLI tool, part of this package:
$ sanity-import --help
CLI tool that imports documents from an ndjson file or URL
Usage
$ sanity-import -p <projectId> -d <dataset> -t <token> sourceFile.ndjson
Options
-p, --project <projectId> Project ID to import to
-d, --dataset <dataset> Dataset to import to
-t, --token <token> Token to authenticate with
--asset-concurrency <concurrency> Number of parallel asset imports
--replace Replace documents with the same IDs
--missing Skip documents that already exist
--allow-failing-assets Skip assets that cannot be fetched/uploaded
--replace-assets Skip reuse of existing assets
--skip-cross-dataset-references Skips references to other datasets
--help Show this help
Examples
# Import "./my-dataset.ndjson" into dataset "staging"
$ sanity-import -p myPrOj -d staging -t someSecretToken my-dataset.ndjson
# Import into dataset "test" from stdin, read token from env var
$ cat my-dataset.ndjson | sanity-import -p myPrOj -d test -
Environment variables (fallbacks for missing flags)
--token = SANITY_IMPORT_TOKEN
Future improvements
- When documents are imported, record which IDs are actually touched
- Only upload assets for documents that are still within that window
- Only strengthen references for documents that are within that window
- Only count number of imported documents from within that window
- Asset uploads and strengthening can be done in parallel, but we need a way to cancel the operations if one of the operations fail
- Introduce retrying of asset uploads based on hash + indexing delay
- Validate that dataset exists upon start
- Reference verification
- Create a set of all document IDs in import file
- Create a set of all document IDs in references
- Create a set of referenced ID that do not exist locally
- Batch-wise, check if documents with missing IDs exist remotely
- When all missing IDs have been cross-checked with the remote API
(or a max of say 100 items have been found missing), reject with
useful error message.
License
MIT-licensed. See LICENSE.