Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Stop micromanaging execution. Focus on the science. Capture your workflow's essence with function pipelines, represent computations as DAGs, and automate parallel sweeps.
pipefunc
is a Python library designed for creating and executing function pipelines.
By simply annotating functions and specifying their outputs, it builds a pipeline that automatically manages the execution order based on dependencies.
Visualize the pipeline as a directed graph, execute the pipeline for all (or specific) outputs, add multidimensional sweeps, automatically parallelize the pipeline, and get nicely structured data back.
[!NOTE] A pipeline is a sequence of interconnected functions, structured as a Directed Acyclic Graph (DAG), where outputs from one or more functions serve as inputs to subsequent ones. pipefunc streamlines the creation and management of these pipelines, offering powerful tools to efficiently execute them.
Whether you're working with data processing, scientific computations, machine learning (AI) workflows, or any other scenario involving interdependent functions, pipefunc
helps you focus on the logic of your code while it handles the intricacies of function dependencies and execution order.
@pipefunc
decorator; execution order is automatically handled.pipefunc
determine which other functions to call based on the provided arguments.pipefunc provides a Pipeline class that you use to define your function pipeline.
You add functions to the pipeline using the pipefunc
decorator, which also lets you specify the function's output name.
Once your pipeline is defined, you can execute it for specific output values, simplify it by combining function nodes, visualize it as a directed graph, and profile the resource usage of the pipeline functions.
For more detailed usage instructions and examples, please check the usage example provided in the package.
Here is a simple example usage of pipefunc to illustrate its primary features:
from pipefunc import pipefunc, Pipeline
# Define three functions that will be a part of the pipeline
@pipefunc(output_name="c")
def f_c(a, b):
return a + b
@pipefunc(output_name="d")
def f_d(b, c):
return b * c
@pipefunc(output_name="e")
def f_e(c, d, x=1):
return c * d * x
# Create a pipeline with these functions
pipeline = Pipeline([f_c, f_d, f_e], profile=True) # `profile=True` enables resource profiling
# Call the pipeline directly for different outputs:
assert pipeline("d", a=2, b=3) == 15
assert pipeline("e", a=2, b=3) == 75
# Visualize the pipeline
pipeline.visualize()
# Show resource reporting (only works if profile=True)
pipeline.print_profiling_stats()
This example demonstrates defining a pipeline with f_c
, f_d
, f_e
functions, accessing and executing these functions using the pipeline, visualizing the pipeline graph, getting all possible argument mappings, and reporting on the resource usage.
This basic example should give you an idea of how to use pipefunc
to construct and manage function pipelines.
The following example demonstrates how to perform a map-reduce operation using pipefunc
:
from pipefunc import pipefunc, Pipeline
from pipefunc.map import load_outputs
import numpy as np
@pipefunc(output_name="c", mapspec="a[i], b[j] -> c[i, j]") # the mapspec is used to specify the mapping
def f(a: int, b: int):
return a + b
@pipefunc(output_name="mean") # there is no mapspec, so this function takes the full 2D array
def g(c: np.ndarray):
return np.mean(c)
pipeline = Pipeline([f, g])
inputs = {"a": [1, 2, 3], "b": [4, 5, 6]}
pipeline.map(inputs, run_folder="my_run_folder", parallel=True)
result = load_outputs("mean", run_folder="my_run_folder")
print(result) # prints 7.0
Here the mapspec
argument is used to specify the mapping between the inputs and outputs of the f
function, it creates the product of the a
and b
input lists and computes the sum of each pair. The g
function then computes the mean of the resulting 2D array. The map
method executes the pipeline for the inputs
, and the load_outputs
function is used to load the results of the g
function from the specified run folder.
See the detailed usage example and more in our example.ipynb.
Install the latest stable version from conda (recommended):
conda install pipefunc
or from PyPI:
pip install "pipefunc[all]"
or install main with:
pip install -U https://github.com/pipefunc/pipefunc/archive/main.zip
or clone the repository and do a dev install (recommended for dev):
git clone git@github.com:pipefunc/pipefunc.git
cd pipefunc
pip install -e ".[dev]"
We use pre-commit
to manage pre-commit hooks, which helps us ensure that our code is always clean and compliant with our coding standards.
To set it up, install pre-commit with pip and then run the install command:
pip install pre-commit
pre-commit install
FAQs
A Python library for defining, managing, and executing function pipelines.
We found that pipefunc demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.