Socket
Socket
Sign inDemoInstall

@danmasta/csv

Package Overview
Dependencies
1
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    @danmasta/csv

CSV Parser


Version published
Weekly downloads
12
increased by500%
Maintainers
1
Install size
1.37 MB
Created
Weekly downloads
 

Readme

Source

CSV

CSV Parser for Node Apps

Features:

  • Easy to use
  • Simple, lightweight, and fast
  • Parse csv data to js objects
  • Header field mapping
  • Value field mapping
  • Supports csv spec rfc 4180
  • Memory safe, only keeps maximum of 1 row in memory during streaming
  • Supports streaming, string parsing, and promises
  • Supports multi-byte newlines (\n, \r\n)
  • Custom newline support
  • Works with buffers and strings
  • Only 1 dependency: lodash

About

I was working on a BI system at work and as our user base kept growing, our jobs got slower and slower. A huge bottleneck for us was csv parsing. We tried many of the popular csv libraries on npm, but were unable to get a decent level of performance consistently from our many jobs. We are parsing around a billion rows per day, with the average row being around 1.5kb and 80+ columns (it's not uncommon for a single job to parse over 10gb of data). Some columns have complex json strings, some are empty, some are in languages like Chinese, and some use \r\n for new lines. We needed something fast, chunk boundary safe, and character encoding safe. Something that worked well on all of our various csv formats. Some of the other parsers failed to accurately parse \r\n newlines across chunk boundaries, some failed altogether, some caused out of memory errors, some were just crazy slow, and some had encoding errors with multi-byte characters.

This library is a result of those learnings. It's fast, stable, works on any data, and is highly configurable. Feel free to check out and run the benchmarks. I'm currently able to achieve around 100mb per second parsing (on a i7 9700k), depending on data and cpu.

Usage

Add csv as a dependency for your app and install via npm

npm install @danmasta/csv --save

Require the package in your app

const csv = require('@danmasta/csv');

By default csv returns a transform stream interface. You can use the methods: promise(), map(), each(), and parse() to consume row objects via promises or as a synchronous array.

Options

nametypedescription
headersboolean|object|array|functionIf truthy, reads the first line as header fields. If false, disables header fields and replaces with integer values of 0-n. If object, the original header name field is replaced with it's value in the mapping. If function, the header field is set to the functions return value. Default is true
valuesboolean|object|functionSame as the header field, but for values. If you want to replace values on the fly, provide an object: {'null': null, 'True', true}. If function, the value is replaced with the functions return value. Default is null
newlinestringWhich character to use for newlines. Default is \n
cbfunctionFunction to call when ready to flush a complete row. Used only for the Parser class. If you implement a custom parser you will need to include a cb function. Default is this.push
bufferbooleanIf true, uses a string decoder to parse buffers. This is set automatically with convenience methods, but will need to be set on custom parser instances. Default is false
encodingstringWhich encoding to use when parsing rows in buffer mode. This doesn't matter if using strings or streams not in buffer mode. Default is utf8
streamobjectOptions to be passed to transform stream constructor. Default is { objectMode: true, encoding: null }

Methods

NameDescription
Parser(opts, str)Parser class for generating custom csv parser instances. Accepts an options object and optional string or buffer to parse
Parser.parse(str, opts)Synchronous parse function. Accepts a string or buffer and optional options object. Returns an array of parsed rows
promise()Returns a promise that resolves with an array of parsed rows
map(fn)Runs an iterator function over each row. Returns a promise that resolves with the new array
each(fn)Runs an iterator function over each row. Returns a promise that resolves with undefined
parse(str)Parses a string or buffer and pushes rows to stream. Need to call flush() when finished. Returns the csv parser instance
flush()Flushes remaining data and pushes final rows to stream. Returns the csv parser instance

Examples

Use CRLF line endings

csv.parse(str, { newline: '\r\n' });

Parse input from a stream and write rows to a destination stream

let read = fs.createReadStream('./data.csv');

read.pipe(csv()).pipe(myDestinationStream());

Parse a string and get results in a promise

csv().parse(str).promise().then(res => {
    console.log('Rows:', res);
});

Run an iteration function over each row using each()

csv().parse(str).each(row => {
    console.log('Row:', row);
}).then(() => {
    console.log('Parse complete!');
});

Update header values

let headers = {
    time: 'timestamp',
    latitude: 'lat',
    longitude: 'lng'
};

csv.parse(str, { headers });

Update headers and/or values with functions

function headers (str) {
    return str.toLowerCase();
}

function values (val) {
    if (val === 'false') return false;
    if (val === 'true') return true;
    if (val === 'null' || val === 'undefined') return null;
    return val.toLowerCase();
}

csv.parse(str, { headers, values });

Create a custom parser that pushes to a queue

const Queue = require('queue');
const q = new Queue();

const parser = csv({ cb: q.push.bind(q) });

parser.parse(str);
parser.flush();

Testing

Testing is currently run using mocha and chai. To execute tests just run npm run test. To generate unit test coverage reports just run npm run coverage

Benchmarks

Benchmarks are currently built using gulp. Just run gulp bench to test timings and bytes per second

Filename         Mode    Density  Rows    Bytes       Time (ms)  Rows/Sec      Bytes/Sec
---------------  ------  -------  ------  ----------  ---------  ------------  --------------
Earthquakes      String  0.11     72,689  11,392,064  121.66     597,492.68    93,641,058.06
Earthquakes      Buffer  0.11     72,689  11,392,064  104.8      693,582.44    108,700,567.12
Pop-By-Zip-LA    String  0.18     3,199   120,912     1.88       1,702,582.7   64,352,197.35
Pop-By-Zip-LA    Buffer  0.18     3,199   120,912     1.28       2,490,424.44  94,130,103.07
Stats-By-Zip-NY  String  0.41     2,369   263,168     15.68      151,058.46    16,780,816.02
Stats-By-Zip-NY  Buffer  0.41     2,369   263,168     14.52      163,183.15    18,127,726.59

Speed and throughput are highly dependent on the density (token matches/byte length) of data as well as the size of the objects created. The earthquakes file test data represents a roughly average level of density for common csv datas. The stats by zip code file represents an example of very dense csv data

Contact

If you have any questions feel free to get in touch

Keywords

FAQs

Last updated on 08 Aug 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc