![Create React App Officially Deprecated Amid React 19 Compatibility Issues](https://cdn.sanity.io/images/cgdhsj6q/production/04fa08cf844d798abc0e1a6391c129363cc7e2ab-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Create React App Officially Deprecated Amid React 19 Compatibility Issues
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.
buckaroo-data
Advanced tools
We all know how awkward it is to clean data in jupyter notebooks. Multiple cells of exploratory work, trying different transforms, looking up different transforms, adhoc functions that work in one notebook and have to be either copied/pasta-ed to the next notebook, or rewritten from scratch. Buckaro makes all of that better by providing a visual UI for common cleaning operations AND emitting python code that performs the transformation. Specifically, the Buckaroo is a tool built to interactively explore, clean, and transform pandas dataframes.
If using JupyterLab, buckaroo
requires JupyterLab version 3 or higher.
You can install buckaroo
using pip
Using pip
:
pip install buckaroo
To get started with using Buckaroo, check out the full documentation:
https://buckaroo-data.readthedocs.io/en/latest/
in a jupyter lab notebook just add the following to a cell
from buckaroo.buckaroo_widget import BuckarooWidget
BuckarooWidget(df=df) #df being the dataframe you want to explore
and you will see the UI for Buckaroo
At the core Buckaroo commands operate on columns. You must first click on a cell (not a header) in the top pane to select a column.
Next you must click on a command like dropcol
, fillna
, or groupby
to create a new command
After creating a new command, you will see that command in the commands list, now you must edit the details of a command. Select the command by clicking on the bottom cell.
At this point you can either delete the command by clicking the X
button or change command parameters.
Builtin commands are found in all_transforms.py
Here is a simple example command
class DropCol(Command):
command_default = [s('dropcol'), s('df'), "col"]
command_pattern = [None]
@staticmethod
def transform(df, col):
df.drop(col, axis=1, inplace=True)
return df
@staticmethod
def transform_to_py(df, col):
return " df.drop('%s', axis=1, inplace=True)" % col
command_default
is the base configuration of the command when first added, s('dropcol')
is a special notation for the function name. s('df')
is a symbol notation for the dataframe argument (see LISP section for details). "col"
is a placeholder for the selected column.
since dropcol
does not take any extra arguments, command_pattern
is [None]
def transform(df, col):
df.drop(col, axis=1, inplace=True)
return df
This transform
is the function that manipulates the dataframe. For dropcol
we take two arguments, the dataframe, and the column name.
def transform_to_py(df, col):
return " df.drop('%s', axis=1, inplace=True)" % col
transform_to_py
emits equivalent python code for this transform. Code is indented 4 space for use in a function.
class GroupBy(Transform):
command_default = [s("groupby"), s('df'), 'col', {}]
command_pattern = [[3, 'colMap', 'colEnum', ['null', 'sum', 'mean', 'median', 'count']]]
@staticmethod
def transform(df, col, col_spec):
grps = df.groupby(col)
df_contents = {}
for k, v in col_spec.items():
if v == "sum":
df_contents[k] = grps[k].apply(lambda x: x.sum())
elif v == "mean":
df_contents[k] = grps[k].apply(lambda x: x.mean())
elif v == "median":
df_contents[k] = grps[k].apply(lambda x: x.median())
elif v == "count":
df_contents[k] = grps[k].apply(lambda x: x.count())
return pd.DataFrame(df_contents)
The GroupBy
command is complex. it takes a 3rd argument of col_spec
. col_spec
is an argument of type colEnum
. A colEnum
argument tells the UI to display a table with all column names, and a drop down box of enum options.
In this case each column can have an operation of either sum
, mean
, median
, or count
applied to it.
Note also the leading 3
in the command_pattern
. That is telling the UI that these are the specs for the 3rd element of the command. Eventually commands will be able to have multiple configured arguments.
Arguments can currently be configured as
integer
- allowing an integer inputenum
- allowing a strict set of options, returned as a string to the transformcolEnum
- allowing a strict set of options per column, returned as a dictionary keyed on column with values of enum optionsThe ideal order of operations is as follows
Column level fixes
DataFrame transformations these transforms largely keep the shape of the data the same
Dataframe transformations 2 These result in a single new dataframe with a vastly different shape
DataFrame transformations 2 These transforms emit multiple DataFrames
DataFrame combination
Buckaroo can only work on a single input dataframe shape at a time. Any newly created columns are visible on output, but not available for manipulation in the same Buckaroo Cell.
Builtin commands are found in all_transforms.py
There are a couple of projects like Buckaroo that aim to provide a better table widget and pandas editing experience.
For a development installation:
git clone https://github.com/paddymul/buckaroo.git
cd buckaroo
conda install ipywidgets=8 jupyterlab
pip install -ve .
Enabling development install for Jupyter notebook:
Enabling development install for JupyterLab:
jupyter labextension develop . --overwrite
Note for developers: the --symlink
argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows.
`
We :heart: contributions.
Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?
We welcome issue reports here; be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.
Before sending a Pull Request, please make sure you read our
FAQs
Fast Datagrid widget for the Jupyter Notebook and JupyterLab
We found that buckaroo-data demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.