
Security News
Crates.io Implements Trusted Publishing Support
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
A Grammar of Data Manipulation in python
Documentation | Reference Maps | Notebook Examples | API
datar
is a re-imagining of APIs for data manipulation in python with multiple backends supported. Those APIs are aligned with tidyverse packages in R as much as possible.
pip install -U datar
# install with a backend
pip install -U datar[pandas]
# More backends support coming soon
Repo | Badges |
---|---|
datar-numpy | |
datar-pandas | |
datar-arrow |
# with pandas backend
from datar import f
from datar.dplyr import mutate, filter_, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter_, if_else, tibble
df = tibble(
x=range(4), # or c[:4] (from datar.base import c)
y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
x y z
<int64> <object> <int64>
0 0 zero 0
1 1 one 1
2 2 two 2
3 3 three 3
"""
df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
x y z
<int64> <object> <int64>
0 0 zero 0
1 1 one 0
2 2 two 1
3 3 three 1
"""
df >> filter_(f.x>1)
"""# output:
x y
<int64> <object>
0 2 two
1 3 three
"""
df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter_(f.z==1)
"""# output:
x y z
<int64> <object> <int64>
0 2 two 1
1 3 three 1
"""
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar import f
from datar.base import sin, pi
from datar.tibble import tibble
from datar.dplyr import mutate, if_else
from plotnine import ggplot, aes, geom_line, theme_classic
df = tibble(x=numpy.linspace(0, 2 * pi, 500))
(
df
>> mutate(y=sin(f.x), sign=if_else(f.y >= 0, "positive", "negative"))
>> ggplot(aes(x="x", y="y"))
+ theme_classic()
+ geom_line(aes(color="sign"), size=1.2)
)
# very easy to integrate with other libraries
# for example: klib
import klib
from pipda import register_verb
from datar import f
from datar.data import iris
from datar.dplyr import pull
dist_plot = register_verb(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()
Thanks for your excellent package to port R (
dplyr
) flow of processing to Python. I have been using other alternatives, and yours is the one that offers the most extensive and equivalent to what is possible now withdplyr
.
FAQs
A Grammar of Data Manipulation in python
We found that datar demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.