grizz
Overview
grizz
is a light library to ingest and transform data
in polars DataFrame.
grizz
uses an object-oriented strategy, where ingestors and transformers are building blocks that
can be combined together.
grizz
can be extend to add custom DataFrame ingestors and transformers.
For example, the following example shows how to change the casting of some columns.
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
Documentation
- latest (stable): documentation from the latest stable
release.
- main (unstable): documentation associated to the
main branch of the repo. This documentation may contain a lot of work-in-progress/outdated/missing
parts.
Installation
We highly recommend installing
a virtual environment.
grizz
can be installed from pip using the following command:
pip install grizz
To make the package as slim as possible, only the minimal packages required to use grizz
are
installed.
To include all the dependencies, you can use the following command:
pip install grizz[all]
Please check the get started page to see how to
install only some specific dependencies or other alternatives to install the library.
The following is the corresponding grizz
versions and their dependencies.
grizz | coola | iden | objectory | polars | python |
---|
main | >=0.8.5,<1.0 | >=0.1.0,<1.0 | >=0.2,<1.0 | >=1.0,<2.0 | >=3.9,<3.14 |
0.1.1 | >=0.8.5,<1.0 | >=0.1.0,<1.0 | >=0.2,<1.0 | >=1.0,<2.0 | >=3.9,<3.14 |
0.1.0 | >=0.8.4,<1.0 | >=0.1.0,<1.0 | >=0.2,<1.0 | >=1.0,<2.0 | >=3.9,<3.14 |
0.0.5 | >=0.7,<1.0 | >=0.0.4,<1.0 | >=0.1,<1.0 | >=1.0,<2.0 | >=3.9,<3.13 |
0.0.4 | >=0.7,<1.0 | >=0.0.4,<1.0 | >=0.1,<1.0 | >=1.0,<2.0 | >=3.9,<3.13 |
Optional dependencies
grizz | clickhouse-connect * | pyarrow * | tqdm * |
---|
main | >=0.7,<1.0 | >=10.0,<19.0 | >=4.65,<5.0 |
0.1.1 | >=0.7,<1.0 | >=10.0,<19.0 | >=4.65,<5.0 |
0.1.0 | >=0.7,<1.0 | >=10.0,<18.0 | >=4.65,<5.0 |
0.0.5 | >=0.7,<1.0 | >=10.0,<18.0 | >=4.65,<5.0 |
0.0.4 | >=0.7,<1.0 | >=10.0,<17.0 | >=4.65,<5.0 |
* indicates an optional dependency
Contributing
Please check the instructions in CONTRIBUTING.md.
Suggestions and Communication
Everyone is welcome to contribute to the community.
If you have any questions or suggestions, you can
submit Github Issues.
We will reply to you as soon as possible. Thank you very much.
API stability
:warning: While grizz
is in development stage, no API is guaranteed to be stable from one
release to the next.
In fact, it is very likely that the API will change multiple times before a stable 1.0.0 release.
In practice, this means that upgrading grizz
to a new version will possibly break any code that
was using the old version of grizz
.
License
grizz
is licensed under BSD 3-Clause "New" or "Revised" license available in LICENSE
file.