csvchk
Vertical display of delimited data
This program will show you the first record of a delimited text file transposed vertically.
It is meant to complement the many features of the csvkit
tools.
For example, given a file like this:
$ csvlook test/test.csv
| id | val |
| -- | --- |
| 1 | foo |
| 2 | bar |
This program will show:
// ****** Record 1 ****** //
id : 1
val : foo
Usage and options
Run with -h
or --help
for a full usage:
$ ./csvchk.py -h
usage: csvchk.py [-h] [-s sep] [-f names] [-l nrecs] [-L nrecs] [-g grep] [-d]
[-n] [-N] [-e encode] [--version]
FILE [FILE ...]
Check a delimited text file
positional arguments:
FILE Input file(s)
optional arguments:
-h, --help show this help message and exit
-s sep, --sep sep Field separator (default: )
-f names, --fieldnames names
Field names (no header) (default: )
-l nrecs, --limit nrecs
How many records to show (default: 1)
-L nrecs, --field-limit nrecs
How many fields to show (default: 0)
-g grep, --grep grep Only show records with a given value (default: )
-d, --dense Not sparse (skip empty fields) (default: False)
-n, --number Show field number (e.g., for awk) (default: False)
-N, --noheaders No headers in first row (default: False)
-e encode, --encoding encode
File encoding (default: utf-8)
--version show program's version number and exit
Separator
The default field separator is a tab character unless the input file has the extension .csv
.
You can change this value using the -s
or --sep
option.
For example, given this file:
$ cat test/test2.txt
id:val
1:foo
2:bar
You could run:
$ csvchk -s ':' test/test2.txt
// ****** Record 1 ****** //
id : 1
val : foo
Field names
The input file is assumed to contain column headers/field names in the first row.
If a file has no such headers, you can provide a comma-separated string with -f
or --fieldnames
of values to use instead.
For example, given this file:
$ cat test/nohdr.csv
1,foo
2,bar
You can run:
$ csvchk -f 'id, value' test/nohdr.csv
// ****** Record 1 ****** //
id : 1
value : foo
Limit
By default, the program will use the -l
or --limit
value of 1
to show the first record.
You can increase this, for example:
$ csvchk -l 2 test/test.csv
// ****** Record 1 ****** //
id : 1
val : foo
// ****** Record 2 ****** //
id : 2
val : bar
To see all the records, use a negative value like -1
:
$ csvchk -l -1 test/test.csv
// ****** Record 1 ****** //
id : 1
val : foo
// ****** Record 2 ****** //
id : 2
val : bar
// ****** Record 3 ****** //
id : 3
val : baz
Dense output
By default, all fields and values will be shown for each record.
For example, given this file:
$ cat test/sparse.csv
id,val
1,foo
2,
,baz
This will be shown:
$ csvchk test/sparse.csv -l -1
// ****** Record 1 ****** //
id : 1
val : foo
// ****** Record 2 ****** //
id : 2
val :
// ****** Record 3 ****** //
id :
val : baz
You can use the -d
or --dense
option to omit fields that have no values:
$ csvchk test/sparse.csv -l -1 -d
// ****** Record 1 ****** //
id : 1
val : foo
// ****** Record 2 ****** //
id : 2
// ****** Record 3 ****** //
val : baz
Numbering fields
The -n
or --number
option will append the field numbers before the output:
$ csvchk -n test/test.tab
// ****** Record 1 ****** //
1 id : 1
2 val : foo
This can be useful if you would like to know the field number to use with awk
, e.g., we could look for records where the val
column (in the second position) has an "a":
$ awk '$2 ~ /a/' test/test.tab
id val
2 bar
If the input file does not have headers (column names) in the first row, you can use the -N
or --noheaders
option to have the program create names like "Field1," "Field2," etc.:
$ csvchk -N test/nohdr.csv
// ****** Record 1 ****** //
Field1 : 1
Field2 : foo
Filter by record contents
You can use the -g
or --grep
option to view only records containing a string:
$ csvchk -g ba -l 2 tests/test.csv
// ****** Record 1 ****** //
id : 2
val : bar
// ****** Record 2 ****** //
id : 3
val : baz
Multiple file inputs
If given multiple files as inputs, the program will insert a header noting the basename of each file:
$ csvchk test/test.csv test/test.tab
==> test.csv <==
// ****** Record 1 ****** //
id : 1
val : foo
==> test.tab <==
// ****** Record 1 ****** //
id : 1
val : foo
Duplicate Column Names
Duplicate column names will have a suffix of _<num>
starting at the second occurrence.
For instance, this file:
$ cat tests/duplicate_cols.csv
name,age,age
Keith,42,42
Jorge,35,35
Geoffrey,51,51
Will produce this output:
$ csvchk tests/duplicate_cols.csv
// ****** Record 1 ****** //
name : Keith
age : 42
age_2 : 42
Limiting the Columns Shown
You may wish to limit the number of columns shown using the -L|--field-limit
option:
$ csvchk --field-limit 1 tests/test.csv
// ****** Record 1 ****** //
id : 1
Author
Ken Youens-Clark kyclark@gmail.com