
Security News
vlt Launches "reproduce": A New Tool Challenging the Limits of Package Provenance
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
The DataRobot mlpiper
module is designed to process and execute complex pipelines, that
consist of one or more components chained together such that output of a
previous component becomes the input to the next component. Each pipeline
has a particular purpose, such as to train a model or generate predictions.
A single pipeline may include components from different languages, such as Python, R and Java.
pip install mlpiper[sagemaker][pyspark][wizard][mlops]
Note: the extra installation options are as follows:
sagemaker
: provides support for sagemaker pipelines, which
requires proper AWS credentialspyspark
: provides support for pyspark pipelineswizard
: provides wizard capability to create a component
metadata file from a command lineCreate a pipeline. Open any text editor and copy the following pipeline description:
{
"name": "Simple MCenter runner test",
"engineType": "Generic",
"pipe": [
{
"name": "Source String",
"id": 1,
"type": "string-source",
"parents": [],
"arguments": {
"value": "Hello World: testing string source and sink"
}
},
{
"name": "Sink String",
"id": 2,
"type": "string-sink",
"parents": [{"parent": 1, "output": 0}],
"arguments": {
"expected-value": "Hello World: testing string source and sink"
}
}
]
}
Clone mlpiper
repo https://github.com/mlpiper/mlpiper/
Components string-source
and string-sink
can be found in the repo path https://github.com/mlpiper/mlpiper/tree/master/reflex-algos/components/Python
Once the mlpiper
python package is installed, the mlpiper
command line tool is available and can be used to execute the above pipeline and the components described in it. Run the example above with:
mlpiper run -f ~/<pipeline description file> -r <path to mlpiper repo>/reflex-algos/components/Python
Create a directory, the name of which corresponds to the component's name (e.g., source_string)
Create a component.json
file (JSON format) inside this directory and make sure to fill in all of the following fields:
{
"engineType": "Generic",
"language": "Python",
"userStandalone": false,
"name": "<Component name (e.g., string_source)>",
"label": "<A lable that is displayed in the UI>",
"version": "<Component's version (e.g., 1.0.0)>",
"group": "<One of the valid groups (e.g., "Connectors")>,
"program": "<The Python component main script (e.g., string_source.py)>",
"componentClass": "<The component class name (e.g., StringSource)
"useMLStats": <true|false - (whether the components uses MLStats)>,
"inputInfo": [
{
"description": "<Description>",
"label": "<Lable name>",
"defaultComponent": "",
"type": "<A type used to verify matching connected legs>,
"group": "<data|model|prediction|statistics|other>"
},
{...}
],
"outputInfo": [
<Same as inputInfo above>
],
"arguments": [
{
"key": "<Unique argument key name>",
"type": "int|long|float|str|bool",
"label": "<A label that is displayed in the UI>",
"description": "<Description>",
"optional": <true|false>
}
]
}
Create the main component script, which contains the component's class name.
This class should inherit from a 'Component' base class, which is taken from
mlpiper.components.component
. The class must implement the materialize
function, with this prototype: def _materialize(self, parent_data_objs, user_data)
.
Here is a complete self contained example:
from mlpiper.components import ConnectableComponent
class StringSource(ConnectableComponent):
def __init__(self, engine):
super(self.__class__, self).__init__(engine)
def _materialize(self, parent_data_objs, user_data):
self._logger.info("Inside string source component")
str_value = self._params.get('value', "default-string-value")
return [str_value]
Notes:
self._logger
object to print logs.self._params
dictionary._materialize
function should return a list of objects or None otherwise.
This returned value will be used as an input for the next component
in the pipeline chain.Place the component's main program (*.py) inside a directory along with its JSON description file and any other desired files.
Open any text editor and copy the following pipeline description:
{
"name": "Simple MCenter runner test",
"engineType": "Generic",
"pipe": [
{
"name": "Source String",
"id": 1,
"type": "string-source",
"parents": [],
"arguments": {
"value": "Hello World: testing string source and sink"
}
},
{
"name": "Sink String",
"id": 2,
"type": "string-sink",
"parents": [{"parent": 1, "output": 0}],
"arguments": {
"expected-value": "Hello World: testing string source and sink"
}
}
]
}
Notes:
string-source
and string-sink
string-source
component (the value returned from
_materialize
function) is supposed to become the input of the string-sink
component (an input to the _materialize
function)Save it with any desired name
Once the mlpiper
python package is installed, mlpiper
command line tool is available
and can be used to execute the above pipeline and the components described in it.
There are three main commands that can be used as follows:
deploy - Deploys a pipeline along with provided components into a given directory. Once deployed, it can be executed directly from the given directory.
run - Deploys and then executes the pipeline.
run-deployment - Executes an already-deployed pipeline.
Prepare a deployment. The resulting directory will be copied to a docker container and run there:
mlpiper deploy -f p1.json -r ~/dev/components -d /tmp/pp
Deploy & Run in-place:
mlpiper run -f p1.json -r ~/dev/components
Deploy & Run. Useful for development and debugging:
mlpiper run -f p1.json -r ~/dev/components -d /tmp/pp
Run a deployment. Usually non-interactive and called by another script:
mlpiper run-deployment --deployment-dir /tmp/pp
FAQs
An engine for running component based ML pipelines
We found that mlpiper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
vlt's new "reproduce" tool verifies npm packages against their source code, outperforming traditional provenance adoption in the JavaScript ecosystem.
Research
Security News
Socket researchers uncovered a malicious PyPI package exploiting Deezer’s API to enable coordinated music piracy through API abuse and C2 server control.
Research
The Socket Research Team discovered a malicious npm package, '@ton-wallet/create', stealing cryptocurrency wallet keys from developers and users in the TON ecosystem.