Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
ch.zizka.csvcruncher:csv-cruncher
Advanced tools
Treats CSV or JSON files as SQL tables, and exports SQL SELECTs back to CSV or JSON.
A tool which treats CSV and JSON files as SQL tables, performs SQL SELECT
queries on them,
and stores the result into CSV and JSON files.
Many tools and systems can export their data to CSV - comma separated values. Typical work with these data is importing them into some spreadsheet like Excel and process them manually.
Web services typically produce results as JSON arrays of objects. Reviewing them in JSON format is not too human friendly,
and processing them using jq
is quite complicated.
With CSV Cruncher, you can automate this processing by writing SQL queries, which produce another CSV or JSON as a result. The SQL operations can be quite powerful – just consider what everything you have at hand:
On top of that, CSV Cruncher can:
And this is planned:
All this is backed by HyperSQL database. See it's very rich SQL syntax and features documentation.
Let's download a JSON with the episodes of Narcos, and pick the best 3 in season 2.
wget -O narcos.json 'http://api.tvmaze.com/singlesearch/shows?q=narcos&embed=episodes'
cruncher/crunch -in narcos.json -itemsAt '_embedded/episodes' -out narcos.csv \
-sql 'SELECT season, number, name FROM $table WHERE season = 2 ORDER BY rating.average DESC LIMIT 3'
open narcos.csv
Download from the Maven repo or the latest release page and unzip.
## Install...
wget "https://repo1.maven.org/maven2/ch/zizka/csvcruncher/csv-cruncher/2.6.0/csv-cruncher-2.6.0.zip"
unzip csv-cruncher-*.zip
mv csv-cruncher-*-dist cruncher
Example run - find the script crunch
which calls Java;
cruncher/crunch -in narcos.json -itemsAt '_embedded/episodes' -out narcos.csv -sql 'SELECT season, number, name FROM $table WHERE rating.average > 8'
Add CsvCruncher to the PATH
, e.g. by linking it from bin
:
ln -s $(pwd)/cruncher/crunch ~/.local/bin/crunch
Requires Java 11 or later.
If you run java -jar csv-cruncher-single.jar
directly, do not add crunch
.
You might need to make the crunch
script executable depending on your OS (until issue #): chmod +x crunch
CsvCruncher has imports and exports.
Each import config starts with -in
, each export with -out
.
Both need a filesystem path to read from, resp. write to, and have further options.
Some import options may also be taken from defaults, which are configured after -all
.
./crunch [<global options...>]
-in <file.csv> [-as <alias>] [--format=JSON|CSV] [-indexed column1,column2,...] [other options...]
-in <file.json> [-as ...] [-itemsAt /path/in/json/tree] [other options...]
-out <resultFile.csv> [-sql <SQL query>] [--format=JSON|CSV] [other options...]
-out ...
-all [<default or global options>]
Some options bellow are planned to be per-import or per-export. Currently, most are applied on all. Leave me a comment in the respective GitHub issues if per-import/export configuration is important for you.
-in
.json
), or directories with such files.--combineInputs
.JOIN
, WHERE
and GROUP BY
clauses. See -indexed ...
-out
.json
, the output is JSON.-
(minus), the result is printed to standard output ("STDOUT").-sql
The SQL SELECT
to be performed.
-db <pathToDatabaseDirectory>
hsqldb/cruncher
.--logLevel=<TRACE|DEBUG|INFO|WARN|ERROR|OFF>
- controls which log lines will appear.
--include=<regex>
, --exclude=<regex>
.*
at the beginning.--exclude
is applied after --include
, so include does not override excluded files.--include
was .*\.csv$
and --exclude
had no match.--ignoreLinesMatching=<regEx>
--ignoreFirstLines[=<number>]
number
lines; the first is considered a header with column names.ignoreLineRegex
.--combineInputs\[=concat|intersect|substract]
--combineDirs\[=perDir|perInputDir|perInputSubdir|all]
all
.--sortInputs\[=paramsOrder|alpha|time]
-indexed <columns>
Read the logs or use -sql SELECT ... FROM INFORMATION_SCHEMA.*
to study the schema created after preprocessing.
--rowNumbers\[=<firstNumber>|remember]
crunchCounter
to the output with unique and incrementing number for each row.<firstNumber>
, the first number to be used can be set.remember
is yet to be implemented, and will continue where the last run stopped.--json\[=entries|array]
entries
(default) will create a JSON entry per line, representing the original rows.array
will create a file with a JSON array ([...,...]
).This README may be slightly obsolete; For a full list of options, check the
Options
class.
Simple SQL SELECT on a single CSV file, producing CSV and JSON:
./crunch -in myInput.csv -out output.csv
-sql "SELECT AVG(duration) AS durAvg FROM (SELECT * FROM myInput ORDER BY duration LIMIT 2 OFFSET 6)"
--json
With input files searched in a directory and concatenated into one table:
./crunch
-in src/test/data/sampleMultiFilesPerDir/apollo_session/
-out target/results/result.csv
--json=entries
--rowNumbers
--combineInputs=concat
--combineDirs=all
-sql 'SELECT session_uid, name, session_type, created_time, modified_date
FROM concat_1 ORDER BY session_type, created_time DESC'
With input files searched in subdirectories of a directory, concatenated, and used as table-per-subdirectory:
(Supported, but example to be added)
Suppose you have a CSV file named eapData.csv
:
## jobName, buildNumber, config, archivePath, arFile, deployDur, warmupDur, scale
'eap-5.1.0-perf-deployers', 355,'production','testdata/war/hellothere.war','hellothere.war',10282,14804,1000
'eap-5.1.0-perf-deployers', 355,'production','testdata/ear/EarWithWar-Counter.ear','EarWithWar-Counter.ear',11005,18904,1000
'eap-5.1.0-perf-deployers', 355,'production','testdata-own/war/war-big-1.0.war','war-big-1.0.war',1966,14800,100
...
Passing it to CsvCruncher would make a table available named eapData
, so you may issue such SQL query:
SELECT jobName, buildNumber, config, ar, arFile, deployDur, warmupDur, scale,
CAST(warmupDur AS DOUBLE) / CAST(deployDur AS DOUBLE) AS warmupSlower
FROM eapData ORDER BY deployDur
To do that, run this command:
crunch -in eapData.csv -out eap-statistics.json -sql "SELECT jobName, ... FROM eapData ..." --json=entries
Notice the .json
suffix, which tells CsvCruncher to produce JSON. --json=entries
then formats it as 1 entry per line rather than an array.
{"crunchcounter":106252117707,"jobName":"'eap-5.1.0-perf-deployers'","buildNumber":" 355","config":"'production'","ar":"'testdata/war/hellothere.war'","arFile":"'hellothere.war'","deployDur":"10282","warmupDur":"14804","scale":"1000","warmupslower":1.4397977047267068}
{"crunchcounter":106252117708,"jobName":"'eap-5.1.0-perf-deployers'","buildNumber":" 355","config":"'production'","ar":"'testdata/ear/EarWithWar-Counter.ear'","arFile":"'EarWithWar-Counter.ear'","deployDur":"11005","warmupDur":"18904","scale":"1000","warmupslower":1.7177646524307133}
{"crunchcounter":106252117709,"jobName":"'eap-5.1.0-perf-deployers'","buildNumber":" 355","config":"'production'","ar":"'testdata-own/war/war-big-1.0.war'","arFile":"'war-big-1.0.war'","deployDur":"1966","warmupDur":"14800","scale":"100","warmupslower":7.527975584944048}
I develop this project ocassionally, when I need it. Which has been surprisingly often in the last 10 years, for various reasons:
That, however, makes it susceptible to being developed in isolated streaks, and lack on features I do not need.
I try to avoid bugs by covering the promised features with tests but it's far from complete coverage.
2022-11-27
Preparing 2.5.x - reading from spreadsheets (Excel/XLS, LibreOffice/ODS, etc.)2022-11-25
Release 2.4.1
2022-10-xx
Release 2.4.0
2021-11-12
Release 2.3.6
2021-11-11
Release 2.3.0
2021-11-07
Release 2.2.0
2021-11-06
Release 2.1.0SELECT foo.bar.baz FROM myJson
is possible.2021-11-06
Version 2.0.0 has a reworked way of arguments work - see below in Options.
2021-11-05
Release 1.31.1
2021-05-02
Release 1.14.0
In case you use this in your project, then beware:
Easter Egg: The original text I sent to JBoss mailing list when introducing the tool in 2011 :)
Hi,
I've crafted a tool which transforms a CSV file into another CSV file using a SQL statement (as I did not find such).
Can be useful for various quick'n'dirty integration during test automation, esp. for data-oriented tasks like perf-tests.
Many tools spit out CSV, or are able to as one of output options. Also, in Hudson, you can very simply log any values you get into bash like echo " $val2, $val2" >> data.csv, for each build or part of a build. So it can be kind of integration tool.
Then you can do quite complex queries - from a flat table, you can sactually do subselects and then left joins, which gives you very powerful tool to process the data into something what is ready for plotting as-is - that means, data filtered, cleaned, aggregated, converted, aligned, sorted, etc.
That might be my POV, since I like SQL and it's my favorite language not only for querying but also data-oriented procedural programming. But nonetheless, I already shortened my perf test task by ~ 40 minutes of my work for each release. Instead of manual shenanigans in OpenOffice, I run a single command, and voila ;-)
HSQL's syntax: http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html (I was very surprised by HSQL's features, it supports much more of SQL than e.g. MySQL.)
Enjoy :)
FAQs
Unknown package
We found that ch.zizka.csvcruncher:csv-cruncher demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.