IBM Watson Pipelines Python Client
This package provides various utilities for working with IBM Watson Pipelines.
Its primary usage is to enable users to store artifact results of a notebook run.
Usage
Construction
WatsonPipelines
client is constructed from IAM/CPD APIKEY, which can be provided
in a few ways:
-
explicitly:
from ibm_watson_pipelines import WatsonPipelines
client = WatsonPipelines(apikey)
client = WatsonPipelines(apikey, url=url, username=username)
client = WatsonPipelines.from_apikey(apikey, url=url, username=username)
client = WatsonPipelines.from_token(token, url=url)
-
implicitly:
APIKEY=...
export APIKEY
USER_NAME=...
export USER_NAME
or
USER_ACCESS_TOKEN=...
export USER_ACCESS_TOKEN
from ibm_watson_pipelines import WatsonPipelines
client = WatsonPipelines.from_apikey()
client = WatsonPipelines.from_token()
client = WatsonPipelines()
client = WatsonPipelines.new_instance()
All of the above may also define service_name
and url
.
The exact procedure of deciding which authentication method to use:
- If
from_apikey
or from_token
is used, the method is forced. - If constructor is used but either
apikey
or bearer_token
argument
was provided, that method will be forced (if both are present,
an overloading error will be raised). Note that providing a nameless
argument is equivalent to providing apikey
. - If constructor or
new_instance
is used, APIKEY
env-var is used. - If constructor or
new_instance
is used, but APIKEY
env-var is not
present, USER_ACCESS_TOKEN
env-var is used. - If none of the above matches, an error is returned.
Usage in Python notebooks
Notebooks run in IBM Watson Pipelines get inputs and expose
outputs as a node:
{
"id": ...,
"type": "execution_node",
"op": "run_container",
"app_data": {
"pipeline_data": {
"name": ...,
"config": {
"link": {
"component_id_ref": "run-notebook"
}
},
"inputs": [
...,
{
"name": "model_name",
"group": "env_variables",
"type": "String",
"value_from": ...
}
],
"outputs": [
{
"name": "trained_model",
"group": "output_variables",
"type": {
"CPDPath": {
"path_type": "resource",
"resource_type": "asset",
"asset_type": "wml_model"
}
}
}
]
}
},
...
}
Inside of the notebook, inputs are available as environmental
variables:
model_name = os.environ['model_name']
Outputs are exposed using sdk method, store_results
:
client = WSPipelines.from_apikey(...)
client.store_results({
"trained_model": ... // cpd path to the trained model
})
On public cloud, this client provides a method for easy retrieval of WML
instance credentials and scope storage credentials:
client.get_wml_credentials()
client.get_wml_credentials("cpd:///projects/123456789")
client.get_storage_credentials()
client.get_storage_credentials("cpd:///projects/123456789")
Note how the result will vary depending on the authentication method
used to create the client.
CPD-Path manipulation
CPD-Path parsing is manipulation is also supported:
from ibm_watson_pipelines import CpdScope, WatsonPipelines
client = WatsonPipelines.from_apikey()
scope = CpdScope.from_string("cpd:///projects/123456789")
assert scope.scope_type() == "projects"
assert scope.scope_id() == "123456789"
client.get_wml_credentials(scope)
Different kinds of CPD-Paths will have different properties, providing the same
interface across scopes, resource and file paths:
from ibm_watson_pipelines import CpdPath
scope_file_path = CpdPath.from_string("cpd:///projects/123456789/files/abc/def")
assert scope_file_path.scope_type() == "projects"
assert scope_file_path.scope_id() == "123456789"
assert scope_file_path.file_path() == "/abc/def"
connection_path = CpdPath.from_string("cpd:///projects/123456789/connections/3141592")
assert connection_path.scope_type() == "projects"
assert connection_path.scope_id() == "123456789"
assert connection_path.resource_type() == "connections"
assert connection_path.resource_id() == "3141592"
connection_file_path = CpdPath.from_string("cpd:///projects/123456789/connections/3141592/files/~/abc/def")
assert connection_file_path.scope_type() == "projects"
assert connection_file_path.scope_id() == "123456789"
assert connection_file_path.resource_type() == "connections"
assert connection_file_path.resource_id() == "3141592"
assert connection_file_path.bucket_name() == "~"
assert connection_file_path.file_path() == "/abc/def"
...additionally, for non-scope paths the scope can be extracted, if present:
from ibm_watson_pipelines import CpdPath
scope_path = CpdPath.from_string("cpd:///projects/123456789")
connection_path = CpdPath.from_string("cpd:///projects/123456789/connections/3141592")
assert connection_path.scope() == scope_path
Custom components for use in the pipeline
A custom pipeline component executes a script you write. You can use custom components to share reusable scripts between pipelines.
You create custom components as project assets. You can then use the components in pipelines you create in that project. You can create as many custom components for pipelines as needed. Currently, to create a custom component you must create one programmatically, using a Python function.
Creating a component as a project asset
To create a custom component, use the Python client to authenticate with IBM Watson Pipelines, code the component, then publish the component to the specified project. After it is available in the project, you can assign it to a node in a pipeline and run it as part of a pipeline flow.
This example demonstrates the process of publishing a component that adds two numbers together.
Publish a function as a component with the latest Python client. Run the following code in a Jupyter notebook in a project of your Cloud Pak for Data.
! pip install ibm-watson-pipelines==1.0.4
from ibm_watson_pipelines import WatsonPipelines
apikey = ''
service_url = 'your_host_url'
project_id = 'your_project_id'
username = ''
client = WatsonPipelines.from_apikey(apikey, url=service_url, username=username)
def add_two_numbers(a: int, b: int) -> int:
print('Adding numbers: {} + {}.'.format(a, b))
return a + b + 10
client.publish_component(
name='Add numbers',
func=add_two_numbers,
description='Custom component adding numbers',
project_id=project_id,
overwrite=True,
)
Manage pipeline components
- list components from a project:
client.get_components(project_id=project_id)
client.get_component(project_id=project_id, component_id=component_id)
client.get_component(project_id=project_id, name=component_name)
client.publish_component(component name)
- delete a component by ID:
client.delete_component(project_id=project_id, component_id=component_id)