Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Data flow tool that transform your notebooks and python files into pipeline steps by standardizing the data input / output. [for Data science project]
Data flow tool that transform your notebooks and python files into pipeline steps by standardizing the data input / output. (for data science projects)
Create clean data flow pipelines just by replacing your pd.read_csv()
and df.to_csv()
by sf.load()
and sf.save()
.
pip install stdflow
from stdflow import StepRunner
from stdflow.pipeline import Pipeline
# Pipeline with 2 steps
dm = "../demo_project/notebooks/"
ingestion_ppl = Pipeline([
StepRunner(dm + "01_ingestion/countries.ipynb"),
StepRunner(dm + "01_ingestion/world_happiness.ipynb")
])
# === OR ===
ingestion_ppl = Pipeline(
StepRunner(dm + "01_ingestion/countries.ipynb"),
StepRunner(dm + "01_ingestion/world_happiness.ipynb")
)
# === OR ===
ingestion_ppl = Pipeline()
ingestion_ppl.add_step(StepRunner(dm + "01_ingestion/countries.ipynb"))
# OR
ingestion_ppl.add_step(dm + "01_ingestion/world_happiness.ipynb")
ingestion_ppl
================================
PIPELINE
================================
STEP 1
path: ../demo_project/notebooks/01_ingestion/countries.ipynb
vars: {}
STEP 2
path: ../demo_project/notebooks/01_ingestion/world_happiness.ipynb
vars: {}
================================
Run the pipeline
ingestion_ppl.run(verbose=True, kernel=":any_available")
=================================================================================
01. ../demo_project/notebooks/01_ingestion/countries.ipynb
=================================================================================
Variables: {}
using kernel: python3
Path: ../demo_project/notebooks/01_ingestion/countries.ipynb
Duration: 0 days 00:00:00.603051
Env: {}
Notebook executed successfully.
=================================================================================
02. ../demo_project/notebooks/01_ingestion/world_happiness.ipynb
=================================================================================
Variables: {}
using kernel: python3
Path: ../demo_project/notebooks/01_ingestion/world_happiness.ipynb
Duration: 0 days 00:00:00.644909
Env: {}
Notebook executed successfully.
import stdflow as sf
import pandas as pd
# load data from ../demo_project/data/countries/step_loaded/v_202309212245/countries.csv
df = sf.load(
root="../demo_project/data/",
attrs=['countries'],
step='created',
version=':last', # loads last version in alphanumeric order
file_name='countries.csv',
method=pd.read_csv, # or method='csv'
verbose=False,
)
# export data to ./data/raw/twitter/france/step_processed/v_1/countries.csv
sf.save(
df,
root="../demo_project/data/",
attrs='countries/',
step='loaded',
version='%Y-03', # creates v_2023-03
file_name='countries.csv',
method=pd.DataFrame.to_csv, # or method='csv' or any function that takes the object to export as first input
)
attrs=countries/::step_name=loaded::version=2023-03::file_name=countries.csv
Each time you perform a save, a metadata.json file is created in the folder. This keeps track of how your data was created and other information.
import stdflow as sf
sf.reset() # used when multiple steps are done with the same Step object (not recommended). see below
# use package level default values
sf.root = "../demo_project/data/"
sf.attrs = 'countries' # if needed use attrs_in and attrs_out
sf.step_in = 'loaded'
sf.step_out = 'formatted'
df = sf.load()
# ! root / attrs / step : used from default values set above
# ! version : the last version was automatically used. default: ":last"
# ! file_name : the file, alone in the folder, was automatically found
# ! method : was automatically used from the file extension
sf.save(df)
# ! root / attrs / step : used from default values set above
# ! version: used default %Y%m%d%H%M format
# ! file_name: used from the input (because only one file)
# ! method : inferred from file name
attrs=countries::step_name=formatted::version=202310101716::file_name=countries.csv
Note that everything we did at package level can be done with the Step class When you have multiple steps in a notebook, you can create one Step object per step. stdflow (sf) at package level is a singleton instance of Step.
from stdflow import Step
step = Step(
root="../demo_project/data/",
attrs='countries',
step_in='formatted',
step_out='pre_processed'
)
# or set after
step.root = "../demo_project/data/"
# ...
df = step.load(version=':last', file_name=":auto", verbose=True)
step.save(df, verbose=True)
INFO:stdflow.step:Loading data from ../demo_project/data/countries/step_formatted/v_202310101716/countries.csv
INFO:stdflow.step:Data loaded from ../demo_project/data/countries/step_formatted/v_202310101716/countries.csv
INFO:stdflow.step:Saving data to ../demo_project/data/countries/step_pre_processed/v_202310101716/countries.csv
INFO:stdflow.step:Data saved to ../demo_project/data/countries/step_pre_processed/v_202310101716/countries.csv
INFO:stdflow.step:Saving metadata to ../demo_project/data/countries/step_pre_processed/v_202310101716/
attrs=countries::step_name=pre_processed::version=202310101716::file_name=countries.csv
Each time you perform a save, a metadata.json file is created in the folder. This keeps track of how your data was created and other information.
import stdflow as sf
step.save(df, verbose=True, export_viz_tool=True)
INFO:stdflow.step:Saving data to ../demo_project/data/countries/step_pre_processed/v_202310101716/countries.csv
INFO:stdflow.step:Data saved to ../demo_project/data/countries/step_pre_processed/v_202310101716/countries.csv
INFO:stdflow.step:Saving metadata to ../demo_project/data/countries/step_pre_processed/v_202310101716/
INFO:stdflow.step:Exporting viz tool to ../demo_project/data/countries/step_pre_processed/v_202310101716/
attrs=countries::step_name=pre_processed::version=202310101716::file_name=countries.csv
This command exports a folder metadata_viz
in the same folder as the
data you exported. The metadata to display is saved in the metadata.json
file.
In order to display it you need to get both the file and the folder on your local pc (download if you are working on a server)
Then go to the html file in your file explorer and open it. it should open in your browser and lets you upload the metadata.json file.
Data flow tool that transform your notebooks and python files into pipeline steps by standardizing the data input / output. (for data science projects)
Create clean data flow pipelines just by replacing your pd.read_csv()
and df.to_csv()
by sf.load()
and sf.save()
.
Data folder organization is systematic and used by the function to load and save. If follows this format: root_data_folder/attrs_1/attrs_2/…/attrs_n/step_name/version/file_name
where:
step_
v_
Each folder is the output of a step. It contains a metadata.json file
with information about all files in the folder and how it was generated.
It can also contain a html page (if you set html_export=True
in
save()
) that lets you visualize the pipeline and your metadata
sf.reset
as part of your final codeFAQs
Data flow tool that transform your notebooks and python files into pipeline steps by standardizing the data input / output. [for Data science project]
We found that stdflow demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.