Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

couchimport

Package Overview
Dependencies
Maintainers
0
Versions
63
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

couchimport

CouchDB import scripts

  • 2.1.1
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
53
decreased by-10.17%
Maintainers
0
Weekly downloads
 
Created
Source

couchimport

Introduction

When populating CouchDB databases, often the source of the data is initially some JSON documents in a file, or some structured CSV/TSV data from another database's export.

couchimport is designed to assist with importing such data into CouchDb efficiently. Simply pipe a file full of JSON documents into couchimport, telling the URL and database to send the data to.

Note: couchimport used to handle the CSV to JSON conversion, but this part is now handled by csvtojsonlines, keeping this package smaller and easier to maintain. The couchimport@1.6.5 package is the last version to support CSV/TSV natively - from 2.0 onwards, couchimport is only for pouring JSONL files into CouchDB.

Also note: the companion CSV export utility (couchexport) is now hosted at couchcsvexport.

Installation

Install using npm or another Node.js package manager:

npm install -g couchimport

Usage

couchimport can either read JSON docs (one per line) from stdin e.g.

cat myfile.json | couchimport

or by passing a filename as the last parameter:

couchimport myfile.json

couchimport's configuration parameters can be stored in environment variables or supplied as command line arguments.

Configuration - environment variables

Simply set the COUCH_URL environment variable e.g. for a hosted Cloudant database

export COUCH_URL="https://myusername:myPassw0rd@myhost.cloudant.com"

and define the name of the CouchDB database to write to by setting the COUCH_DATABASE environment variable e.g.

export COUCH_DATABASE="mydatabase"

Simply pipe the text data into "couchimport":

cat mydata.jsonl | couchimport

Configuring - command-line options

Supply the --url and --database parameters as command-line parameters instead:

couchimport --url "http://user:password@localhost:5984" --database "mydata" mydata.jsonl

or by piping data into stdin:

cat mydata.jsonl | couchimport --url "http://user:password@localhost:5984" --database "mydata" 

Handling CSV/TSV data

We can use another package csvtojsonlines to convert CSV/TSV files into a JSONL stream acceptable to couchimport:

# CSV file ----> JSON lines ---> CouchDB
cat transactions.csv | csvtojsonlines --delimiter ',' | couchimport --db ledger

Generating random data

couchimport can be paired with datamaker to generate any amount of sample data:

# template ---> datamaker ---> 100 JSON docs ---> couchimport ---> CouchDB
echo '{"_id":"{{uuid}}","name":"{{name}}","email":"{{email true}}","dob":"{{date 1950-01-01}}"}' | datamaker -f json -i 100 | couchimport --db people
written {"docCount":100,"successCount":1,"failCount":0,"statusCodes":{"201":1}}
written {"batch":1,"batchSize":100,"docSuccessCount":100,"docFailCount":0,"statusCodes":{"201":1},"errors":{}}
Import complete

or with the template as a file:

cat template.json | datamaker -f json -i 10000 | couchimport --db people

Understanding errors

We know if we get an HTTP 4xx/5xx response, then all of the documents failed to be written to the database. But as couchimport is writing data in bulk, the bulk request may get an HTTP 201 response that doesn't mean that all of the documents were written. Some of the document ids may have been in the database already. So the couchimport output includes counts of the number of documents that were written successfully and the number that failed, and a tally of the HTTP response codes and individual document error messages:

e.g.

written {"batch":10,"batchSize":1,"docSuccessCount":4,"docFailCount":6,"statusCodes":{"201":10},"errors":{"conflict":6}}

The above log line shows that after the tenth batch of writes, we have written 4 documents and failed to write 6 others. There were six "conflict" errors, meaning that there was a clash of document id or id/rev combination.

Parallel writes

Older versions of couchimport supported the ability to have multiple HTTP requests in flight at any one time, but the new simplified couchimport does not. To achieve the same thing, simply split your file of JSON docs into smaller pieces and run multiple couchimport jobs:

# split large file into files 1m lines each
# this will create files xaa, xab, xac etc
split -l 1000000 massive.txt
# find all files starting with x and using xargs,
# spawn a max of 2 process at once running couchimport, 
# one for each file
find . -name "x*" | xargs -t -I % -P 2 couchimport --db test %

Environment variables reference

  • COUCH_URL - the url of the CouchDB instance (required, or to be supplied on the command line)
  • COUCH_DATABASE - the database to deal with (required, or to be supplied on the command line)
  • COUCH_BUFFER_SIZE - the number of records written to CouchDB per bulk write (defaults to 500, not required)
  • IAM_API_KEY - to use IBM IAM to do authentication, set the IAM_API_KEY to your api key and a bearer token will be used in the HTTP requests.

Command-line parameters reference

You can also configure couchimport using command-line parameters:

  • --help - show help
  • --url/-u - the url of the CouchDB instance (required, or to be supplied in the environment)
  • --database/--db/-d - the database to deal with (required, or to be supplied in the environment)
  • --buffer/-b - the number of records written to CouchDB per bulk write (defaults to 500, not required)

Using programmatically

In your project, add couchimport into the dependencies of your package.json or run npm install --save couchimport. In your code, require the library with

const couchimport = require('couchimport')

and your options are set in an object whose keys are the same as the command line paramters:

e.g.

const opts = { url: "http://localhost:5984", database: "mydb", rs: fs.createReadStream('myfile.json') }
await couchimport(opts)

Note: rs is the readstream where data will be read (default: stdin) and ws is the write stream where the output will be written (default: stdout)

Keywords

FAQs

Package last updated on 31 Jul 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc