#CSV2JSON
All you need nodejs csv to json converter. Support big json data, CLI, web server, nested JSON, customised parser, stream, pipe, and more!
#IMPORTANT!!
Since version 0.3, the core class of csvtojson has been inheriting from stream.Transform class. Therefore, it will behave like a normal Stream object and CSV features will not be available any more. Now the usage is like:
var Converter=require("csvtojson").core.Converter;
var fs=require("fs");
var csvFileName="./myCSVFile";
var fileStream=fs.createReadStream(csvFileName);
var csvConverter=new Converter({constructResult:true});
csvConverter.on("end_parsed",function(jsonObj){
console.log(jsonObj);
});
fileStream.pipe(csvConverter);
##Menu
GitHub: https://github.com/Keyang/node-csvtojson
##Installation
npm install -g csvtojson
##Features
- Powerful library for you nodejs applications processing csv data.
- Extremly straight forward
- Multiple input support: CSV File, Readable Stream, CSV String etc.
- Highly extendible with your own rules and parsers for outputs.
- Multiple interfaces (webservice, command line)
##Usage
###Command Line Tools
csvtojson
Example
csvtojson ./myCSVFile
Or use pipe:
cat myCSVFile | csvtojson
To start a webserver
csvtojson startserver [options]
Advanced usage with parameters support, check help:
csvtojson --help
WebService
After webserve being initialised, it is able to use http post with CSV data as body.
For example, we start web server with default configuration:
csvtojson startserver
And then we use curl to perform a web request:
curl -X POST -d "date,*json*employee.name,*json*employee.age,*json*employee.number,*array*address,*array*address,*jsonarray*employee.key,*jsonarray*employee.key,*omit*id
2012-02-12,Eric,31,51234,Dunno Street,Kilkeny Road,key1,key2,2
2012-03-06,Ted,28,51289,Cambridge Road,Tormore,key3,key4,4" http://127.0.0.1:8801/parseCSV
API
Use csvtojson library to your own project.
Import csvtojson to your package.json or install through npm:
npm install csvtojson
Quick Start
The core of the tool is Converter class. It is based on node-csv library (version 0.3.6). Therefore it has all features of node-csv. To start a parse, simply use following code:
var Converter=require("csvtojson").core.Converter;
var fs=require("fs");
var csvFileName="./myCSVFile";
var fileStream=fs.createReadStream(csvFileName);
var param={};
var csvConverter=new Converter(param);
csvConverter.on("end_parsed",function(jsonObj){
console.log(jsonObj);
});
fileStream.pipe(csvConverter);
Params
The parameters for Converter constructor are:
- constructResult: true/false. Whether to constrcut final json object in memory which will be populated in "end_parsed" event. Set to false if deal with huge csv data. default: true.
- delimiter: delimiter used for seperating columns. default: ","
- quote: If a column contains delimiter, it is able to use quote character to surround the column content. e.g. "hello, world" wont be split into two columns while parsing. default: " (double quote)
Parser
CSVTOJSON allows adding customised parsers which concentrating on what to parse and how to parse.
It is the main power of the tool that developer only needs to concentrate on how to deal with the data and other concerns like streaming, memory, web, cli etc are done automatically.
How to add a customised parser:
var parserMgr=require("csvtojson").core.parserMgr;
parserMgr.addParser("myParserName",/^\*parserRegExp\*/,function (params){
var columnTitle=params.head;
var fieldName=columnTitle.replace(this.regExp, "");
params.resultRow[fieldName]="Hello my parser"+params.item;
});
parserMgr's addParser function take three parameters:
-
parser name: the name of your parser. It should be unique.
-
Regular Expression: It is used to test if a column of CSV data is using this parser. In the example above any column's first row starting with parserRegExp will be using it.
-
Parse function call back: It is where the parse happens. The converter works row by row and therefore the function will be called each time needs to parse a cell in CSV data.
The parameter of Parse function is a JSON object. It contains following fields:
head: The column's first row's data. It generally contains field information. e.g. arrayitems
item: The data inside current cell. e.g. item1
itemIndex: the index of current cell of a row. e.g. 0
rawRow: the reference of current row in array format. e.g. ["item1", 23 ,"hello"]
resultRow: the reference of result row in JSON format. e.g. {"name":"Joe"}
rowIndex: the index of current row in CSV data. start from 1 since 0 is the head. e.g. 1
resultObject: the reference of result object in JSON format. It always has a field called csvRows which is in Array format. It changes as parsing going on. e.g.
{
"csvRows":[
{
"itemName":"item1",
"number":10
},
{
"itemName":"item2",
"number":4
}
]
}
WebServer
It is able to start the web server through code.
var webServer=require("csvtojson").interfaces.web;
var server=webServer.startWebServer({
"port":"8801",
"urlpath":"/parseCSV"
});
It will return an expressjs Application. You can add your own web app content there. It will return http.Server object.
Events
Following events are used for Converter class:
- end_parsed: It is emitted when parsing finished. the callback function will contain the JSON object if constructResult is set to true.
- record_parsed: it is emitted each time a row has been parsed. The callback function has following parameters: result row JSON object reference, Original row array object reference, row index of current row in csv (header row does not count, first row content will start from 0)
To subscribe the event:
var Converter=require("csvtojson").core.Converter;
csvConverter.on("end_parsed",function(jsonObj){
console.log(jsonObj);
});
csvConverter.on("record_parsed",function(resultRow,rawRow,rowIndex){
console.log(resultRow);
});
Default Parsers
There are default parsers in the library they are
Array: For columns head start with "*array*" e.g. "*array*fieldName", this parser will combine cells data with same fieldName to one Array.
Nested JSON: For columns head start with "*json*" e.g. "*json*my.nested.json.structure", this parser will create nested nested JSON structure: my.nested.json
Nested JSON Array: For columns head start with "*jsonarray*" e.g. "*jsonarray*my.items", this parser will create structure like my.items[].
Omitted column: For columns head start with "*omit*" e.g. "*omit*id", the parser will omit the column's data.
####Example:
Original data:
date,*json*employee.name,*json*employee.age,*json*employee.number,*array*address,*array*address,*jsonarray*employee.key,*jsonarray*employee.key,*omit*id
2012-02-12,Eric,31,51234,Dunno Street,Kilkeny Road,key1,key2,2
2012-03-06,Ted,28,51289,Cambridge Road,Tormore,key3,key4,4
Output data:
{
"csvRows": [
{
"date": "2012-02-12",
"employee": {
"name": "Eric",
"age": "31",
"number": "51234",
"key": [
"key1",
"key2"
]
},
"address": [
"Dunno Street",
"Kilkeny Road"
]
},
{
"date": "2012-03-06",
"employee": {
"name": "Ted",
"age": "28",
"number": "51289",
"key": [
"key3",
"key4"
]
},
"address": [
"Cambridge Road",
"Tormore"
]
}
]
}
Big CSV File
csvtojson library was designed to accept big csv file converting. To avoid memory consumption, it is recommending to use read stream and write stream.
var Converter=require("csvtojson").core.Converter;
var csvConverter=new Converter({constructResult:false});
var readStream=require("fs").createReadStream("inputData.csv");
var writeStream=require("fs").createWriteStream("outpuData.json");
readStream.pipe(csvConverter).pipe(writeStream);
The constructResult:false will tell the constructor not to combine the final result which would drain the memory as progressing. The output is piped directly to writeStream.
Convert Big CSV File with Command line tool
csvtojson command line tool supports streaming in big csv file and stream out json file.
It is very convenient to process any kind of big csv file. It's proved having no issue to proceed csv files over 3,000,000 lines (over 500MB) with memory usage under 30MB.
Once you have installed csvtojson, you could use the tool with command:
csvtojson [path to bigcsvdata] > converted.json
Or if you prefer streaming data in from another application:
cat [path to bigcsvdata] | csvtojson > converted.json
They will do the same job.
Column Array
To convert a csv data to column array, you have to construct the result in memory. See example below
var columArrData=__dirname+"/data/columnArray";
var rs=fs.createReadStream(columArrData);
var result = {}
var csvConverter=new CSVAdv();
csvConverter.on("end_parsed", function(jsonObj) {
console.log(result);
console.log("Finished parsing");
done();
});
csvConverter.on("record_parsed", function(resultRow, rawRow, rowIndex) {
for (var key in resultRow) {
if (!result[key] || !result[key] instanceof Array) {
result[key] = [];
}
result[key][rowIndex] = resultRow[key];
}
});
rs.pipe(csvConverter);
Here is an example:
TIMESTAMP,UPDATE,UID,BYTES SENT,BYTES RCVED
1395426422,n,10028,1213,5461
1395426422,n,10013,9954,13560
1395426422,n,10109,221391500,141836
1395426422,n,10007,53448,308549
1395426422,n,10022,15506,72125
It will be converted to:
{
"TIMESTAMP": ["1395426422", "1395426422", "1395426422", "1395426422", "1395426422"],
"UPDATE": ["n", "n", "n", "n", "n"],
"UID": ["10028", "10013", "10109", "10007", "10022"],
"BYTES SENT": ["1213", "9954", "221391500", "53448", "15506"],
"BYTES RCVED": ["5461", "13560", "141836", "308549", "72125"]
}
Parse String
To parse a string, simply call fromString(csvString,callback) method.
For example:
var testData=__dirname+"/data/testData";
var data=fs.readFileSync(testData).toString();
var csvConverter=new CSVConverter();
csvConverter.on("end_parsed", function(jsonObj) {
});
csvConverter.fromString(data,function(err,jsonObj){
if (err){
}
console.log(jsonObj);
});
#Change Log
##0.3.5
- Added fromString method to support direct string input
##0.3.4
- Added more parameters to command line tool.
##0.3.2
- Added quote in parameter to support quoted column content containing delimiters
- Changed row index starting from 0 instead of 1 when populated from record_parsed event
##0.3
- Removed all dependencies
- Deprecated applyWebServer
- Added construct parameter for Converter Class
- Converter Class now works as a proper stream object