This package is a Drupal json:api
client library with one primary responsibility - to crawl through a Drupal
produced json:api and save the resulting data to static json files in
directory structures to allow easy access to the files.
Why all the trouble? For Drupal sites with only hundreds or low thousands of
pages (the majority) enabling the (now core) json:api
module in conjunction with this library allows for fully static front ends.
Having a way to export all of a site's data to static json files allows
those files to be deployed, statically, along with a site's decoupled front
end.
It also presents an opportunity to transform the standard json:api output
to something a little more friendly for developers to work with. Ideally this
library is used during the static generation process.
Getting started
Crawling all drupal nodes of a given content type with each node's associated
relationships (including paragraphs)
is pretty easy.
const { Spider } = require('drupal-jsonapi-extractor')
const baseURL = 'https://example.org/jsonapi/'
const spider = new Spider({ baseURL })
spider.crawl('/node/blog')
spider.crawlNodes()
While the above Spider
does crawl through an entire set of content types it does
not actually do anything with the results. This is where we introduce the
Extractor
object.
const { Spider, Extractor } = require('drupal-jsonapi-extractor')
const baseURL = 'https://example.org/jsonapi/'
const spider = new Spider({ baseURL })
const extractor = new Extractor(spider, { location: './downloads' })
extractor.wipe().then(() => spider.crawl('/node/content-type'))
Note: The extractor has a helpful utility function wipe()
which will returns a
Promise
and ensures the target directory is completely empty before resolving.
The above code will output a new downloads
directory with the structure:
downloads/
_resources/
node/
blog/
0ef56bbd-b2d6-475e-8b83-e1fa9bc1e7fb.json
paragraph/
hero/
425a6dc1-5158-4f12-8d54-eb8a7af369f0.json
taxonomy_term/
tags/
2d850e4b-9d2f-4b8f-b1e7-ad959de8b393.json
_slugs/
node/
1.json
blogs/
my-first-blog-post.json
This structure is intended to serve static sites well by allowing lookup by
the unique json:api global unique id, as well as the more traditional drupal
path (node/1
) and a node's alias "slug" (/blogs/my-first-blog-post
).
The extractor by default saves the exact output of the json:api. However, when
developing your decoupled front end you may prefer a slightly less verbose json
schema. This package includes a transformer that allows easily "cleaning" of the
output:
const extractor = new Extractor(spider, {
location: './downloads'
clean: true
})
Sometimes it is nice to see the progress of the download process. This package
includes a console logger as well.
const { Spider, Extractor, Logger } = require('drupal-jsonapi-extractor')
const baseURL = 'https://example.org/jsonapi/'
const spider = new Spider({ baseURL })
const extractor = new Extractor(spider, { location: './downloads' })
const logger = new Logger([spider, extractor])
spider.crawl('/node/content-type')
The logger in our example would print to the command line:
✔️ node: 1
✔️ taxonomy_term: 1
✔️ paragraph: 1
----------------------------
🎉 Crawl complete!
Errors.................0
node...................1
paragraph..............1
taxonomy_term..........1
Configuration options
Each of the provided classes have a number of configuration options.
Spider
You pass options as the first argument when instantiating a new Spider
.
new Spider(options)
{
baseURL: 'https://example.org/jsonapi/'
api: axios,
terminateOnError: false,
maxConcurrent: 5,
resourceConfig: {
relationships: [
new RegExp(/^field_/)
]
}
}
You pass options as the second argument when instantiating a new Extractor
.
const extractor = new Extractor(spider, options)
extractor.wipe().then(() => spider.crawlNodes())
extractor.wipe().then(() => spider.crawlNodes(5))
Note: above we use a helpful utility method wipe()
which will returns a
Promise
and ensures the target directory is completely empty before resolving.
{
location: './',
clean: false,
pretty: false,
transformer: transformer({
attributeFilters: [
/^field_/,
/^(title|created|changed|langcode|body)$/,
/^(name|weight|description)$/,
/^(parent_type|parent_id)$/
],
relationshipFilters: [
/^field_/
],
fieldPropertyFilters: [
/^links$/
],
cleanFields: callback
})
}
Internally this library represents every crawled response with a Resource
object. If you choose to override the transformer
callback it will be given
a Resource
as an argument. You can read the source code for details on it's
functionality. If you want change the configuration options
of our transformer, you can customize it:
const { Spider, Extractor, transformer } = require('drupal-jsonapi-extractor')
const baseURL = 'https://example.org/jsonapi/'
const spider = new Spider({ baseURL })
const extractor = new Extractor(spider, {
location: './downloads'
clean: true,
transformer: transformer({
attributeFilters: [
/^custom_attribute_to_keep$/
]
})
})
spider.crawl('/node/content-type')
Logger
The logger, at the moment, is pretty simple with just one configuration option:
new Logger([...emitters], {
verbosity: 1
})
To do
Currently there is effectively no test coverage, although test files for the
classes have been written with an instantiation check in each.