snaplib
A simple data preprocessing tools.
user guide
Kaggle Notebook
Classification
https://www.kaggle.com/code/artyomkolas/titanic-snaplib-classification/notebook
Regression
https://www.kaggle.com/code/artyomkolas/housing-prices-with-snaplib/notebook
PyPi
!pip install snaplib
from snaplib.snaplib import Snaplib
sl = Snaplib()
- sl.nan_info
- sl.nan_plot
- sl.cleane
- sl.recover_data - NaN imputing with ML
- sl.dummied
- sl.encode_dataframe
- sl.decode_dataframe
- sl.train_test_split_balanced
- sl.k_folds_split
For one and list of algorithms with bagging
- sl.cross_val
- sl.features_selection_regr
- sl.features_selection_clsf
- sl.fit_stacked
- sl.save_stack
- sl.load_stack
- sl.predict_stacked
doc
print(sl.recover_data.__doc__)
Imputing of missing values (np.nan) in tabular data, not TimeSeries.
Use case:
df = Snaplib().recover_data(df, device="cpu", verbose=True)
device must be "cpu" or "gpu". Sometime small datasets work faster with cpu.
verbose = True algorithm runs cross validation tests and prints results of tests for decision making.
discrete_columns = ['col_name_1', 'col_name_2', 'col_name_3', 'etc']
TESTS on https://www.kaggle.com/code/artyomkolas/nan-prediction-in-progress/notebook