New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

csv2parquet

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

csv2parquet

A tool to convert CSVs to Parquet files

0.0.9
PyPI
Maintainers
1

csv2parquet

Build Status codecov

Convert a CSV to a parquet file. You may also find sqlite-parquet-vtable or parquet-metadata useful.

Installing

If you just want to use the tool:

sudo pip install pyarrow csv2parquet

If you want to clone the repo and work on the tool, install its dependencies via pipenv:

pipenv install

Usage

Next, create some Parquet files. The tool supports CSV and TSV files.

usage: csv2parquet [-h] [-n ROWS] [-r ROW_GROUP_SIZE] [-o OUTPUT] [-c CODEC]
                   [-i INCLUDE [INCLUDE ...] | -x EXCLUDE [EXCLUDE ...]]
                   [-R RENAME [RENAME ...]] [-t TYPE [TYPE ...]]
                   csv_file

positional arguments:
  csv_file              input file, can be CSV or TSV

optional arguments:
  -h, --help            show this help message and exit
  -n ROWS, --rows ROWS  The number of rows to include, useful for testing.
  -r ROW_GROUP_SIZE, --row-group-size ROW_GROUP_SIZE
                        The number of rows per row group.
  -o OUTPUT, --output OUTPUT
                        The parquet file
  -c CODEC, --codec CODEC
                        The compression codec to use (brotli, gzip, snappy,
                        zstd, none)
  -i INCLUDE [INCLUDE ...], --include INCLUDE [INCLUDE ...]
                        Include the given columns (by index or name)
  -x EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
                        Exclude the given columns (by index or name)
  -R RENAME [RENAME ...], --rename RENAME [RENAME ...]
                        Rename a column. Specify the column to be renamed and
                        its new name, eg: 0=age or person_age=age
  -t TYPE [TYPE ...], --type TYPE [TYPE ...]
                        Parse a column as a given type. Specify the column and
                        its type, eg: 0=bool? or person_age=int8. Parse errors
                        are fatal unless the type is followed by a question
                        mark. Valid types are string (default), base64, bool,
                        float32, float64, int8, int16, int32, int64, timestamp

Testing

pylint csv2parquet
pytest

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts