🔥 Firestore Backfire

Ultimate control over importing and exporting data from Firestore and the
Firestore Emulator, on your CLI and in your code.
This documentation is for 2.x. Find documentation for 1.x
here.
✨ Features ✨
- Import and export your Firestore data with ease
- Specify which documents or collections are imported or exported using paths or
by matching regex patterns
- Control the depth of subcollections to import or export
- Limit the number of documents to export
- Import and export data as
NDJSON to
a variety of storage sources:
- local files
- Google Cloud Storage
- AWS S3
- Or implement your own data source
Table of contents
Installation
Install firestore-backfire
and @google-cloud/firestore
using your favourite
package manager.
yarn add firestore-backfire @google-cloud/firestore
pnpm add firestore-backfire @google-cloud/firestore
npm install firestore-backfire @google-cloud/firestore
Peer dependencies for Google Cloud Storage
If you plan to import and export data from Google Cloud Storage, you should
install:
Peer dependencies for AWS S3
If you plan to import and export data from S3, you should install:
Additionally, if you want to use a
credential profile
to run this program, you should also install:
@aws-sdk/credential-provider-ini
Usage and examples
CLI
Firestore Backfire can be called on the CLI using backfire
. The aliases
bf
and firestore
are also provided for convenience.
If installed in your project, run it using your package manager:
yarn backfire import path-to-my-data ...
If installed globally, you can call it directly:
backfire import path-to-my-data ...
You can also use it in your package.json
scripts.
// package.json
{
"scripts": {
"import-my-data": "backfire import path-to-my-data ...",
},
}
CLI options
All options listed in the documentation have a CLI flag equivalent unless
otherwise specified. The flag will always be --
followed by the option name.
For example, the option limit
can be passed on the CLI using --limit
. In
most cases, a shorthand may be available. Use the backfire [command] --help
command to see the available options and their respective flags.
CLI examples
Export documents...
- to a file called
emails.ndjson
in an S3 bucket called bucket
, using the
AWS credentials profile named default
- from a Firestore project called
demo
using the credentials found at
key.json
- from the
emails
and messages
collection
backfire export s3://bucket/emails --awsProfile default -P demo -K key.json --paths emails messages --awsRegion us-east-1
Export documents...
- to a local file called
emails.ndjson
in the export
folder - from a Firestore project called
demo
using the credentials found at
key.json
- from the
emails
collection - where the document id starts with "abc" or "123"
- where the document id cannot end with "xyz"
backfire export ./export/emails -P demo -K key.json --paths emails --match ^emails/abc ^emails/123 --ignore xyz$
Import documents...
- from a file called
emails.ndjson
in a Google Cloud Storage bucket called
bucket
, belonging to a project with the ID gcp-demo
, using a service
account key file called gcp-demo.json
- to the
demo
project in the Firestore Emulator running on port 8080 - where the document belongs to a root level collection (depth of 0)
- only import the first 10 documents
- overwrite any existing data
backfire import gs://bucket/emails --gcpProject gcp-demo --gcpKeyFile gcp-demo.json -P demo -E localhost:8080 --depth 0 --limit 10 --mode overwrite
Node
Firestore Backfire exposes functions in Node that you can use to import and
export data using a data source.
import {
importFirestoreData,
exportFirestoreData,
} from "firebase-backfire";
await importFirestoreData(connection, reader, options);
await exportFirestoreData(connection, writer, options);
Options for specifying the Firestore instance to connect to can be provided
through the connection
parameter. The reader
and writer
parameters are
data sources (see here for
more information on how to create a data source). The options
parameter allow
you to configure the import/export behaviour.
Exporting data
To export data from Firestore, use the export
command on the CLI, or use the
exportFirestoreData
function in Node. Each document is exported as per the
SerializedFirestoreDocument
interface as a line of
NDJSON.
backfire export <path> [options]
import { exportFirestoreData } from "firestore-backfire";
await exportFirestoreData(connection, writer, options);
When using the CLI, path
should point to the location where you want the data
to be exported to. This can be a path to a local file, a Google Cloud Storage
path (prefixed with gs://
), or an S3 path (prefixed with s3://
).
When using the exportFirestoreData
function, the connection
parameter can be
an instance of Firestore, or it can be an object that specifies
options for creating a connection to Firestore. The
writer
parameter must be an implementation of
IDataSourceWriter. See the section on
data sources for more information.
Options
All options have a CLI flag equivalent unless otherwise
specified. Follows the
ExportFirestoreDataOptions
interface.
Option | Type | Description |
---|
paths | string[] | Provide a list of paths where you want to export data from. This can be a collection path (e.g. emails ), or a path to a document (e.g. emails/1 ). If not specified, all paths will be exported, starting from the root collections. |
match | RegExp[] | Provide a list of regex patterns that a document path must match to be exported. |
ignore | RegExp[] | Provide a list of regex patterns that prevent a document from being exported when its path matches any of the patterns. Takes precedence over match . |
depth | number | Limit the subcollection depth to export documents from. Documents in the root collection have a depth of 0. If not specified, no limit is applied. |
limit | number | Limit the number of documents to export. If not specified, no limit is applied. |
overwrite | boolean | Overwrite any existing data at the output path. Defaults to false . |
update | number | The interval (in seconds) at which update logs are printed. Update logs are at the debug level. Defaults to 5 . |
exploreInterval* | number | The interval (in milliseconds) at which chunks of paths are dequeued for exploration using Firestore SDK's listDocuments() or listCollections() methods. Defaults to 10 . |
exploreChunkSize* | number | The chunk size to use when dequeuing paths for exploration. Defaults to 5000 . |
downloadInterval* | number | The interval (in milliseconds) at which chunks of document paths are dequeued to be filtered and downloaded from Firestore. Defaults to 1000 . |
downloadChunkSize* | number | The chunk size to use when dequeueing paths for download. Defaults to limit if supplied, otherwise it dequeues all available paths. |
* Advanced configuration - default values should be suitable for most use
cases. Considered internal, so may change as implementation changes.
Logging options
By default, only log messages at the info
level and above are printed. Follows
the LoggingOptions interface.
Option | Type | Description |
---|
debug | boolean | Print debug level logs and higher. |
verbose | boolean | Print verbose level logs and higher. Overrides debug . |
quiet | boolean | Silence all logs. Overrides debug and verbose . |
Importing data
To import data into Firestore, use the import
command on the CLI, or use the
importFirestoreData
function in Node. The data being imported is expected to
be in NDJSON
format, where each line follows the
SerializedFirestoreDocument
interface.
backfire import <path> [options]
import { importFirestoreData } from "firestore-backfire";
await importFirestoreData(connection, reader, options);
When using the CLI, path
should point to the location where you want the data
to be imported from. This can be a path to a local file, a Google Cloud Storage
path (prefixed with gs://
), or an S3 path (prefixed with s3://
).
When using the importFirestoreData
function, the connection
parameter can be
an instance of Firestore, or it can be an object that specifies
options for creating a connection to Firestore. The
reader
parameter must be an implementation of
IDataSourceReader. See the section on
data sources for more information.
⚠️ NOTE: When using the Firestore Emulator, importing a large amount of data
can result in errors as the emulator is not designed to scale.
Options
All options have a CLI flag equivalent unless otherwise
specified. Follows the
ImportFirestoreDataOptions
interface.
Option | Type | Description |
---|
paths | string[] | Provide a list of paths where you want to import data from. This can be a collection path (e.g. emails ), or a path to a document (e.g. emails/1 ). If not specified, all paths will be imported. |
match | RegExp[] | Provide a list of regex patterns that a document path must match to be imported. |
ignore | RegExp[] | Provide a list of regex patterns that prevent a document from being imported if its path matches any of the patterns. Takes precedence over match . |
depth | number | Limit the subcollection depth to import documents from. Documents in the root collection have a depth of 0. If not specified, no limit is applied. |
limit | number | Limit the number of documents to import. If not specified, no limit is applied. |
mode | "create" "insert" "overwrite" "merge" | Specify how to handle importing documents that would overwrite existing data. See the import mode section for more information. Defaults to create . |
update | number | The interval (in seconds) at which update logs are printed. Update logs are at the debug level. Defaults to 5 . |
flush* | number | The interval (in seconds) at which documents are flushed to Firestore. Defaults to 1 . |
processInterval* | number | The interval (in milliseconds) at which documents are processed as they stream in from the data source. Defaults to 10 . |
processLimit* | number | The maximum number of pending writes to Firestore. Defaults to 200 . |
* Advanced configuration - default values should be suitable for most use
cases. Considered internal, so may change as implementation changes.
Import mode
The mode
option specifies how to handle importing documents that would
overwrite existing data in Firestore. The default import mode is create
.
create
mode will log an error when importing documents that already exist in
Firestore, and existing documents will not be modified.insert
mode will only import documents that do not exist, and existing
documents will not be modified.overwrite
mode will import documents that do not exist, and completely
overwrite any existing documents.merge
mode will import documents that do not exist, and merge existing
documents.
Logging options
By default, only log messages at the info
level and above are printed. Follows
the LoggingOptions interface.
Option | Type | Description |
---|
debug | boolean | Print debug level logs and higher. |
verbose | boolean | Print verbose level logs and higher. Overrides debug . |
quiet | boolean | Silence all logs. Overrides debug and verbose . |
Get document
Have you ever wanted to quickly inspect or export a document as JSON from
Firestore? This CLI command can help you do just that. path
should be a valid
Firestore document path. Prints the document as pretty JSON.
Also ensure you provide appropriate options for
connecting to Firestore.
backfire get <path> [options]
Options
All options have a CLI flag equivalent unless otherwise
specified. Follows the
GetFirestoreDataOptions interface.
Option | Type | Description |
---|
stringify | boolean or number | JSON.stringify() the output. Pass true to use the default indent of 2, or pass a number to specify the indent amount. |
List documents and collections
List the document IDs or collection IDs at the specified path
.
Also ensure you provide appropriate options for
connecting to Firestore.
backfire list documents <path> [options]
backfire list collections [path] [options]
When listing collections, you may leave path
empty to list root collections,
or pass a valid Firestore document path to list its subcollections.
Options
All options have a CLI flag equivalent unless otherwise
specified. Follows the
ListFirestoreDataOptions interface.
Option | Type | Description |
---|
limit | number | Limit the number of documents/collections to return. Note that this does not "truly" limit the API call, it only truncates the output after the data is received from Firebase. |
Count documents and collections
Count the number of the documents in a collection, or the number of collections
at the specified path
.
Also ensure you provide appropriate options for
connecting to Firestore.
backfire count documents <path> [options]
backfire count collections [path] [options]
When counting collections, you may leave path
empty to count root collections,
or pass a valid Firestore document path to count its subcollections.
Connecting to Firestore
In order to read and write data to Firestore, you will need to specify some
options for the connection. Follows the
FirestoreConnectionOptions interface.
Option | Type | Description |
---|
project | string | The ID of the Firestore project to connect to. |
adc | boolean | Use Application Default Credentials. |
keyFile | string | The path to a service account's private key JSON file. Takes precedence over adc . |
emulator | string or boolean | Connect to a local Firestore emulator. Defaults to localhost:8080 . Pass a string value to specify a different host. Takes precedence over adc and keyFile . |
credentials* | object | Service account credentials. Fields client_email and private_key are expected. Takes precedence over adc , keyFile and emulator . |
* Not available in the CLI.
- The
project
option is always required - To connect to a real Firestore instance, you must specify
adc
or keyFile
,
or pass a credentials
object (Node only) - If you are connecting to a local Firestore emulator, you can use the
emulator
option
As an alternative, these options can also be provided through a
configuration file or as environment variables. Note that
CLI options will always take precedence over environment variables.
GOOGLE_CLOUD_PROJECT
can be used to provide project
GOOGLE_APPLICATION_CREDENTIALS
can be used to provide keyFile
FIRESTORE_EMULATOR_HOST
can be used to provide emulator
In Node, you can also pass an existing instance of Firestore instead of
providing connection options.
Data sources
A data source provides a way to to read and write data to an external location.
This package comes with a few implementations, and exports interfaces for you to
implement your own ones in Node if the provided implementations do not suit your
needs.
Local
This data source reads and writes data as local files to your machine. To use
this data source on the CLI, specify a path
that points to a valid file
path (note that this is different from v1). If the path is in a directory that
does not exist, it will be created for you.
No other configuration options are required.
Google Cloud Storage
This data source reads and writes data from a Google Cloud Storage bucket. To
use this data source on the CLI, specify a path
beginning with gs://
.
Credentials for reading and writing to the Google Cloud Storage bucket must also
be provided as CLI options or through a
configuration file.
Option | Type | Description |
---|
gcpProject | string | The Google Cloud project the bucket belongs to. |
gcpAdc | boolean | Use Application Default Credentials. |
gcpKeyFile | string | Path to the service account credentials file to use. Takes precedence over gcpAdc . |
gcpCredentials* | object | Service account credentials. Fields client_email and private_key are expected. Takes precedence over gcpAdc and gcpKeyFile . |
* Not available in the CLI.
- The
gcpProject
option is always required - You must specify
gcpAdc
or gcpKeyFile
, or pass a gcpCredentials
object
(Node only)
Alternatively, these values can also be provided through the corresponding
environment variables:
GOOGLE_CLOUD_PROJECT
can be used to provide gcpProject
GOOGLE_APPLICATION_CREDENTIALS
can be used to provide gcpKeyFile
IMPORTANT: These environment variables are also used by
Firestore connection options. If you need to use
different credentials for connecting to Firestore and accessing Google Cloud
Storage, you can override the environment variables by passing them as CLI
options or through a configuration file.
AWS S3
This data source reads and writes data from an S3 bucket. To use this data
source on the CLI, specify a path
beginning with s3://
.
Credentials for reading and writing to the S3 bucket must also be provided as
CLI options or through a configuration file.
Option | Type | Description |
---|
awsRegion | string | The AWS region to use. |
awsProfile | string | The name of the profile to use from your local AWS credentials. Requires @aws-sdk/credential-provider-ini to be installed. |
awsAccessKeyId | string | The access key id to use. This takes precedence over the awsProfile option, which means that if you provide awsProfile as well as access keys, the access keys will be used. |
awsSecretAccessKey | string | The secret access key to use. This takes precedence over the awsProfile option, which means that if you provide awsProfile as well as access keys, the access keys will be used. |
- The
awsRegion
option is always required - You can choose to use either
awsProfile
, or awsAccessKeyId
and
awsSecretAccessKey
Alternatively, these values can also be provided through the corresponding
environment variables:
AWS_REGION
can be used to provide awsRegion
AWS_PROFILE
can be used to provide awsProfile
AWS_ACCESS_KEY_ID
can be used to provide awsAccessKeyId
AWS_SECRET_ACCESS_KEY
can be used to provide awsSecretAccessKey
Creating a data source in Node
All provided data source implementations are registered in a default instance of
DataSourceFactory, which is
exposed to you in Node. You can create a reader or writer implementation
directly from the factory by calling the createReader()
or createWriter()
method.
The factory will automatically select the data source to create based on the
path
it was given. The default implementation will fall back to using the
local data source if the path does not match any other data sources.
import {
dataSourceFactory,
importFirestoreData,
exportFirestoreData,
} from "firestore-backfire";
const path = "s3://my-bucket/exported-data.ndjson";
const reader = await dataSourceFactory.createReader(path, options);
const writer = await dataSourceFactory.createWriter(path, options);
await importFirestoreData(connection, reader, options);
await exportFirestoreData(connection, writer, options);
Custom data sources
There are two types of data sources: readers and writers. A reader reads
text data from a stream, whilst a writer writes lines of text data to a stream.
A data source does not need to provide both a reader and a writer, but obviously
if a reader is not provided, you cannot import data, and if a writer is not
provided, you cannot export data.
To create a data source and make it useable with Firestore Backfire, follow
these steps:
- Create at least one of the following:
- Construct a IDataSource
object, in which you should define:
- A unique
id
for the data source - A
match
function, which takes a path
parameter and returns true
if
the path can be used with this data source - A
reader
property, which can your IDataSourceReader class directly, or
provide a function that will create an instance of it (this can be left
empty if you do not want to import data) - A
writer
property, which can your IDataSourceWriter class directly, or
provide a function that will create an instance of it (this can be left
empty if you do not want to export data)
- Register the data source using the
register()
method on the default
DataSourceFactory instance
(exposed as dataSourceFactory
)
Once your data source has been registered, you can use the createReader()
or
createWriter()
methods on the default
DataSourceFactory instance to
construct your data source.
Alternatively, you can instantiate your custom data source yourself and pass it
directly to the importFirestoreData
or exportFirestoreData
if you do not
need to support different path types or use the default implementations.
Implementation example
You can always take a look at how the provided implementations are written by
looking at the source code, and seeing how they are
registered. Below is a basic example as a
reference.
import {
IDataSourceReader,
IDataSourceWriter,
dataSourceFactory,
} from "firestore-backfire";
class MyDataReader implements IDataSourceReader {
}
class MyDataWriter implements IDataSourceReader {
}
interface MyCustomOptions {
username?: string;
password?: string;
}
dataSourceFactory.register<MyCustomOptions>({
id: "custom",
match: (path) => path.startsWith("custom://"),
reader: { useClass: MyDataReader },
writer: {
useFactory: async (path, options) => {
if (!options.username) throw new Error("username is required");
if (!options.password) throw new Error("password is required");
return new MyDataWriter(path, options.username, options.password);
},
},
});
const path = "custom://my-data";
const reader = await dataSourceFactory.getReader<MyCustomOptions>(path, {
username: "...",
password: "...",
});
Configuration file
Instead of providing options on the CLI, you can also set defaults through a
configuration file. You can use the flag --config <path>
to point to a
specific file to use as configuration. Note that CLI options will always
override options provided through a configuration file.
IMPORTANT: Do not to commit any secrets in your config file to version
control.
The configuration file is loaded using
cosmiconfig, which
supports a wide range of configuration file formats. Some examples of supported
formats:
- .backfirerc.json
- .backfirerc.yaml
- .backfirerc.js
- backfire.config.js
Sample YAML config:
project: demo-project
keyFile: ./service-account.json
emulator: localhost:8080
paths:
- emails
match:
- ^emails/123
ignore:
- xyz$
depth: 2
Sample JSON config:
{
"project": "demo-project",
"keyFile": "./service-account.json",
"emulator": "localhost:8080",
"paths": ["emails"],
"match": ["^emails/123"],
"ignore": ["xyz$"],
"depth": 2
}
Migration
1.x to 2.x
Firestore Backfire v2 is a rewrite of v1 to provide a more up to date and
extensible design. It provides new and improved functionality, uses NDJSON as
the data format, and no longer uses worker threads.
Breaking changes
-p
has been renamed to -P
-k
has been renamed to -K
-e
has been renamed to -E
--patterns
has been renamed to --match
--workers
has been removed as worker threads are no longer used--logLevel
has been removed, use --verbose
, --debug
or --silent
instead--prettify
has been renamed to --stringify
--force
has been renamed to --overwrite
--mode
values have changed to "create", "insert", "overwrite", "merge"- Import and export file format changed to NDJSON (not backward compatible)
New features
- New options:
ignore
(--ignore, -i
) to ignore pathslimit
(--limit, -l
) to limit number of documents imported/exportedupdate
(--update
) to specify the frequency of update messages- A few more advanced configuration options
- New commands:
backfire get <path>
to get a document from Firestorebackfire list:documents <path>
to list documents in a collectionbackfire list:collections [path]
to list root collections or
subcollections
- Support for passing some options as environment variables
GOOGLE_CLOUD_PROJECT
GOOGLE_APPLICATION_CREDENTIALS
AWS_PROFILE
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
- Ability to create custom data sources in Node
- Ability to use an existing Firestore instance in Node
Contributing
Thanks goes to these wonderful people
(emoji key):
This project follows the
all-contributors
specification. Contributions of any kind welcome! Please follow the
contributing guidelines.
Changelog
Please see CHANGELOG.md.
License
Please see LICENSE.