Create a component.json
file (JSON format) inside this directory and make sure to fill in all of the following fields:
{
"engineType": "Generic",
"language": "Python",
"userStandalone": false,
"name": "<Component name (e.g., string_source)>",
"label": "<A lable that is displayed in the UI>",
"version": "<Component's version (e.g., 1.0.0)>",
"group": "<One of the valid groups (e.g., "Connectors")>,
"program": "<The Python component main script (e.g., string_source.py)>",
"componentClass": "<The component class name (e.g., StringSource)
"useMLStats": <true|false - (whether the components uses MLStats)>,
"inputInfo": [
{
"description": "<Description>",
"label": "<Lable name>",
"defaultComponent": "",
"type": "<A type used to verify matching connected legs>,
"group": "<data|model|prediction|statistics|other>"
},
{...}
],
"outputInfo": [
<Same as inputInfo above>
],
"arguments": [
{
"key": "<Unique argument key name>",
"type": "int|long|float|str|bool",
"label": "<A label that is displayed in the UI>",
"description": "<Description>",
"optional": <true|false>
}
]
}
Create the main component script, which contains the component's class name.
This class should inherit from a 'Component' base class, which is taken from
mlpiper.components.component
. The class must implement the materialize
function, with this prototype: def _materialize(self, parent_data_objs, user_data)
.
Here is a complete self contained example:
from mlpiper.components import ConnectableComponent
class StringSource(ConnectableComponent):
def __init__(self, engine):
super(self.__class__, self).__init__(engine)
def _materialize(self, parent_data_objs, user_data):
self._logger.info("Inside string source component")
str_value = self._params.get('value', "default-string-value")
return [str_value]
Notes:
- A component can use
self._logger
object to print logs. - A component may access to pipeline parameters via
self._params
dictionary. - The
_materialize
function should return a list of objects or None otherwise.
This returned value will be used as an input for the next component
in the pipeline chain.