couchdb-dump
A command line tool that outputs all documents in a CouchDB database. Also included is a command line tool that takes that output as input and loads it back into a CouchDB database.
Reading and writing the data is done via stdin and stdout, respectively. The output of cdbdump
is a JSON document containing a "docs" array element which contains every document in the database. The cdbload
command takes an input which is exactly the same as the output of cdbdump
and writes every document in it into the target database.
Internally this is just calling on CouchDB's _all_docs and _bulk_docs endpoints. To do what it does, this package just glues together the power of Node.js streams and the following modules...
Many thanks to the authors of those amazing packages for making this possible.
Installation
npm install couchdb-dump -g
Usage Examples
The following will dump the contents of a CouchDB database called myhugedatabase running on port 5984 on localhost. The output is written to a file called myhugedatabase.json.
cdbdump -d myhugedatabase > myhugedatabase.json
If you are doing this for archiving purposes you could do something cool like this to extract and gzip it in one step ...
cdbdump -d myhugedatabase | gzip > myhugedatabase.json.gz
Both of the following command examples will load all the documents in the myhugedatabase.json file into a CouchDB database called myhugeduplicate.
cdbload -d myhugeduplicate < myhugedatabase.json
OR
cat myhugedatabase.json | cdbload -d myhugeduplicate
You can even do this ...
cdbdump -d myhugedatabase | cdbload -d myhugeduplicate
... which streams all the docs from one CouchDB database into a second one. While this works well, you should probably take a look at using CouchDB's awesome built-in replication features instead.
Full Usage
If you execute the cdbdump
or cdbload
commands with no arguments or with --help, the following usage information will be printed on the console and the command will exit.
usage: cdbdump [-u username] [-p password] [-h host] [-P port] [-r protocol] [-s json-stringify-space] [-k dont-strip-revs] -d database
usage: cdbload [-u username] [-p password] [-h host] [-P port] [-r protocol] [-v verbose] -d database
The username and password, if supplied, will be used for authentication via Basic Auth (RFC2617). Currently this is the only supported authentication method.
The -s paramater for cdbdump
is used as the third paramater to JSON.stringify() for the amount of white space to use if you want the output to be pretty-printed.
The -v parameter for cdbload
will print CouchDB's response body to the console. That will be an array with one JSON result object for every object loaded in!!
CouchDB revision fields
By default, the _rev
element of every document in the database is stripped out of the output of cdbdump
. This allows the list of documents to be used as input to cdbload
. If the -k parameter is given to cdbdump
, then the _rev
elements will not be stripped out and this will cause CouchDB to be unable to easily load these documents through the _bulk_docs
endpoint because every document will error with a mismatched revision message (assuming you are loading into an empty database). See CouchDB _bulk_docs documentation for more details and ways you can manipulate the cdbdump
output to get around that in case you need to keep the _rev
values in your dump.
Default options for cdbdump
host = localhost
port = 5984
protocol = http
json-stringify-space = 0
dont-strip-revs = false
Default options for cdbload
host = localhost
port = 5984
protocol = http
verbose = false
Why Another CouchDB Dump Command?
I wrote this because I couldn't find a cli dump tool for CouchDB that allowed me to pipe output directly. Since then I also found these other excellent options which you should definitely check out as well:
Testing
I had to get this going quick and dirty at the moment so there are no automated tests in place yet.
Contributing
Fork and PR. Thanks!!
Release History