CubedPandas
Filter faster, analyze smarter – your DataFrames deserve it!
CubedPandas offer a new easy, fast & fun approach to filter, navigate and analyze Pandas dataframes.
CubedPandas is inspired by the concept of OLAP databases
and aims to bring add comfort and power to Pandas dataframe handling.
For novice users, CubedPandas can be a great help to get started with Pandas, as it hides
the complexity and verbosity of Pandas dataframes. For experienced users, CubedPandas
can be a productivity booster, as it allows you to write more compact, explicit, readable and
maintainable code, e.g. this Pandas code:
value = df.loc[
(df['make'] == 'Audi') &
(df['engine'] == 'hybrid') &
(df['date'] >= '2024-09-01') & (df['date'] <= '2024-09-30'),
'revenue'
].sum()
can turn into this equivalent CubedPandas code:
value = df.cubed.make.Audi.engine.hybrid.date.september_2024.revenue
value = df.cubed.Audi.hybrid.sep_2024
df = df.cubed.make.Audi.engine.hybrid.df
CubedPandas offers a fluent interface based on the data available in the underlying DataFrame.
So, filtering, navigation and analysis of Pandas dataframes becomes more intuitive, more readable and more fun.
CubedPandas neither duplicates data nor modifies the underlying DataFrame, and it introduces
no performance penalty. In fact, it can sometimes significantly speed up your data processing.
Jupyter notebooks is the perfect habitat for CubedPandas. For further information,
please visit the CubedPandas Documentation
or try some of the included samples.
Getting Started
CubedPandas is available on pypi.org (https://pypi.org/project/cubedpandas/) and can be installed by
pip install cubedpandas
Using CubedPandas is as simple as wrapping any Pandas dataframe into a cube like this:
import pandas as pd
from cubedpandas import cubed
df = pd.DataFrame({"product": ["Apple", "Pear", "Banana", "Apple", "Pear", "Banana"],
"channel": ["Online", "Online", "Online", "Retail", "Retail", "Retail"],
"customer": ["Peter", "Peter", "Paul", "Paul", "Mary", "Mary" ],
"mailing": [True, False, True, False, True, False ],
"revenue": [100, 150, 300, 200, 250, 350 ],
"cost": [50, 90, 150, 100, 150, 175 ]})
cdf = cubed(df)
CubedPandas automatically infers a multi-dimensional schema from your Pandas dataframe which
defines a virtual Cube over the dataframe. By default, numeric columns of the dataframe
are considered as Measures - the numeric values to analyse & aggregate - all other columns are
considered as Dimensions - to filter, navigate and view the data. The individual values in a
dimension column are called the Members of the dimension. In the example above, column channel
becomes a dimension with the two members Online
and Retail
, revenue
and cost
are our measures.
Although rarely required, you can also define your own schema. Schemas are quite powerful and flexible,
as they will allow you to define dimensions and measures, aliases and (planned for upcoming releases)
also custom aggregations, business logic, number formating, linked cubes (star-schemas) and much more.
Context please, so I will give you data!
One key feature of CubePandas is an easy & intuitive access to individual Data Cells in
multi-dimensional data space. To do so, you'll need to define a multi-dimensional Context so
CubedPandas will evaluate, aggregate (sum
by default) and return the requested value from
the underlying dataframe.
Context objects behave like normal numbers (float, int), so you can use them directly in arithmetic
operations. In the following examples, all addresses will refer to the exactly same rows from the dataframe
and thereby all return the same value of 100
.
a = df.loc[(df["product"] == "Apple") & (df["channel"] == "Online") & (df["customer"] == "Peter"), "revenue"].sum()
b = cdf["product:Apple", "channel:Online", "customer:Peter"].revenue
c = cdf.product["Apple"].channel["Online"].customer[
"Peter"].revenue
d = cdf.product.Apple.channel.Online.customer.Peter.revenue
e = cdf["Online", "Apple", "Peter", "revenue"]
f = cdf.Online.Apple.Peter.revenue
g = cdf.Online.Apple.Peter
assert a == b == c == d == e == f == g == 100
Context objects also act as filters on the underlying dataframe. So you can use also CubedPandas for
fast and easy filtering only, e.g. like this:
df = df.cubed.product["Apple"].channel["Online"].df
df = df.cubed.Apple.Online.df
Pivot, Drill-Down, Slice & Dice
The Pandas pivot table is a very powerful tool. Unfortunately, it is quite verbose and very hard to master.
CubedPandas offers the slice
method to create pivot tables in a more intuitive and easy way, e.g. by default
cdf.slice(rows="product", columns="channel", measures="revenue")
For further information, samples and a complete feature list as well as valuable tips and tricks,
please visit the CubedPandas Documentation.
Your feedback, ideas and support are very welcome!
Please help improve and extend CubedPandas with your feedback & ideas and use the
CubedPandas GitHub Issues to request new features and report bugs.
For general questions, discussions and feedback, please use the
CubedPandas GitHub Discussions.
If you have fallen in love with CubedPandas or find it otherwise valuable,
please consider to become a sponsor of the CubedPandas project so we
can push the project forward faster and make CubePandas even more awesome.
...happy cubing!