Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Python SDK for working with https://github.com/conductor-oss/conductor.
Conductor is the leading open-source orchestration platform allowing developers to build highly scalable distributed applications.
Check out the official documentation for Conductor.
Show support for the Conductor OSS. Please help spread the awareness by starring Conductor repo.
Before installing Conductor Python SDK, it is a good practice to set up a dedicated virtual environment as follows:
virtualenv conductor
source conductor/bin/activate
The SDK requires Python 3.9+. To install the SDK, use the following command:
python3 -m pip install conductor-python
In this section, we will create a simple "Hello World" application that executes a "greetings" workflow managed by Conductor.
Create greetings_workflow.py with the following:
from conductor.client.workflow.conductor_workflow import ConductorWorkflow
from conductor.client.workflow.executor.workflow_executor import WorkflowExecutor
from greetings_worker import greet
def greetings_workflow(workflow_executor: WorkflowExecutor) -> ConductorWorkflow:
name = 'greetings'
workflow = ConductorWorkflow(name=name, executor=workflow_executor)
workflow.version = 1
workflow >> greet(task_ref_name='greet_ref', name=workflow.input('name'))
return workflow
Create greetings_workflow.json
with the following:
{
"name": "greetings",
"description": "Sample greetings workflow",
"version": 1,
"tasks": [
{
"name": "greet",
"taskReferenceName": "greet_ref",
"type": "SIMPLE",
"inputParameters": {
"name": "${workflow.input.name}"
}
}
],
"timeoutPolicy": "TIME_OUT_WF",
"timeoutSeconds": 60
}
Workflows must be registered to the Conductor server. Use the API to register the greetings workflow from the JSON file above:
curl -X POST -H "Content-Type:application/json" \
http://localhost:8080/api/metadata/workflow -d @greetings_workflow.json
[!note] To use the Conductor API, the Conductor server must be up and running (see Running over Conductor standalone (installed locally)).
Using Python, a worker represents a function with the worker_task decorator. Create greetings_worker.py file as illustrated below:
[!note] A single workflow can have task workers written in different languages and deployed anywhere, making your workflow polyglot and distributed!
from conductor.client.worker.worker_task import worker_task
@worker_task(task_definition_name='greet')
def greet(name: str) -> str:
return f'Hello {name}'
Now, we are ready to write our main application, which will execute our workflow.
Let's add helloworld.py with a main
method:
from conductor.client.automator.task_handler import TaskHandler
from conductor.client.configuration.configuration import Configuration
from conductor.client.workflow.conductor_workflow import ConductorWorkflow
from conductor.client.workflow.executor.workflow_executor import WorkflowExecutor
from greetings_workflow import greetings_workflow
def register_workflow(workflow_executor: WorkflowExecutor) -> ConductorWorkflow:
workflow = greetings_workflow(workflow_executor=workflow_executor)
workflow.register(True)
return workflow
def main():
# The app is connected to http://localhost:8080/api by default
api_config = Configuration()
workflow_executor = WorkflowExecutor(configuration=api_config)
# Registering the workflow (Required only when the app is executed the first time)
workflow = register_workflow(workflow_executor)
# Starting the worker polling mechanism
task_handler = TaskHandler(configuration=api_config)
task_handler.start_processes()
workflow_run = workflow_executor.execute(name=workflow.name, version=workflow.version,
workflow_input={'name': 'Orkes'})
print(f'\nworkflow result: {workflow_run.output["result"]}\n')
print(f'see the workflow execution here: {api_config.ui_host}/execution/{workflow_run.workflow_id}\n')
task_handler.stop_processes()
if __name__ == '__main__':
main()
Set the following environment variable to point the SDK to the Conductor Server API endpoint:
export CONDUCTOR_SERVER_URL=http://localhost:8080/api
To start the Conductor server in a standalone mode from a Docker image, type the command below:
docker run --init -p 8080:8080 -p 5000:5000 conductoross/conductor-standalone:3.15.0
To ensure the server has started successfully, open Conductor UI on http://localhost:5000.
To run the application, type the following command:
python helloworld.py
Now, the workflow is executed, and its execution status can be viewed from Conductor UI (http://localhost:5000).
Navigate to the Executions tab to view the workflow execution.
For running the workflow in Orkes Conductor,
export CONDUCTOR_SERVER_URL=https://[cluster-name].orkesconductor.io/api
export CONDUCTOR_SERVER_URL=https://play.orkes.io/api
export CONDUCTOR_AUTH_KEY=your_key
export CONDUCTOR_AUTH_SECRET=your_key_secret
Run the application and view the execution status from Conductor's UI Console.
[!NOTE] That's it - you just created and executed your first distributed Python app!
There are three main ways you can use Conductor when building durable, resilient, distributed applications.
A Workflow task represents a unit of business logic that achieves a specific goal, such as checking inventory, initiating payment transfer, etc. A worker implements a task in the workflow.
The workers can be implemented by writing a simple Python function and annotating the function with the @worker_task
. Conductor workers are services (similar to microservices) that follow the Single Responsibility Principle.
Workers can be hosted along with the workflow or run in a distributed environment where a single workflow uses workers deployed and running in different machines/VMs/containers. Whether to keep all the workers in the same application or run them as a distributed application is a design and architectural choice. Conductor is well suited for both kinds of scenarios.
You can create or convert any existing Python function to a distributed worker by adding @worker_task
annotation to it. Here is a simple worker that takes name
as input and returns greetings:
from conductor.client.worker.worker_task import worker_task
@worker_task(task_definition_name='greetings')
def greetings(name: str) -> str:
return f'Hello, {name}'
A worker can take inputs which are primitives - str
, int
, float
, bool
etc. or can be complex data classes.
Here is an example worker that uses dataclass
as part of the worker input.
from conductor.client.worker.worker_task import worker_task
from dataclasses import dataclass
@dataclass
class OrderInfo:
order_id: int
sku: str
quantity: int
sku_price: float
@worker_task(task_definition_name='process_order')
def process_order(order_info: OrderInfo) -> str:
return f'order: {order_info.order_id}'
Workers use a polling mechanism (with a long poll) to check for any available tasks from the server periodically. The startup and shutdown of workers are handled by the conductor.client.automator.task_handler.TaskHandler
class.
from conductor.client.automator.task_handler import TaskHandler
from conductor.client.configuration.configuration import Configuration
def main():
# points to http://localhost:8080/api by default
api_config = Configuration()
task_handler = TaskHandler(
workers=[],
configuration=api_config,
scan_for_annotated_workers=True,
import_modules=['greetings'] # import workers from this module - leave empty if all the workers are in the same module
)
# start worker polling
task_handler.start_processes()
# Call to stop the workers when the application is ready to shutdown
task_handler.stop_processes()
if __name__ == '__main__':
main()
Each worker embodies the design pattern and follows certain basic principles:
A system task worker is a pre-built, general-purpose worker in your Conductor server distribution.
System tasks automate repeated tasks such as calling an HTTP endpoint, executing lightweight ECMA-compliant javascript code, publishing to an event broker, etc.
[!tip] Wait is a powerful way to have your system wait for a specific trigger, such as an external event, a particular date/time, or duration, such as 2 hours, without having to manage threads, background processes, or jobs.
from conductor.client.workflow.task.wait_task import WaitTask
# waits for 2 seconds before scheduling the next task
wait_for_two_sec = WaitTask(task_ref_name='wait_for_2_sec', wait_for_seconds=2)
# wait until end of jan
wait_till_jan = WaitTask(task_ref_name='wait_till_jsn', wait_until='2024-01-31 00:00 UTC')
# waits until an API call or an event is triggered
wait_for_signal = WaitTask(task_ref_name='wait_till_jan_end')
{
"name": "wait",
"taskReferenceName": "wait_till_jan_end",
"type": "WAIT",
"inputParameters": {
"until": "2024-01-31 00:00 UTC"
}
}
Make a request to an HTTP(S) endpoint. The task allows for GET, PUT, POST, DELETE, HEAD, and PATCH requests.
from conductor.client.workflow.task.http_task import HttpTask
HttpTask(task_ref_name='call_remote_api', http_input={
'uri': 'https://orkes-api-tester.orkesconductor.com/api'
})
{
"name": "http_task",
"taskReferenceName": "http_task_ref",
"type" : "HTTP",
"uri": "https://orkes-api-tester.orkesconductor.com/api",
"method": "GET"
}
Execute ECMA-compliant Javascript code. It is useful when writing a script for data mapping, calculations, etc.
from conductor.client.workflow.task.javascript_task import JavascriptTask
say_hello_js = """
function greetings() {
return {
"text": "hello " + $.name
}
}
greetings();
"""
js = JavascriptTask(task_ref_name='hello_script', script=say_hello_js, bindings={'name': '${workflow.input.name}'})
{
"name": "inline_task",
"taskReferenceName": "inline_task_ref",
"type": "INLINE",
"inputParameters": {
"expression": " function greetings() {\n return {\n \"text\": \"hello \" + $.name\n }\n }\n greetings();",
"evaluatorType": "graaljs",
"name": "${workflow.input.name}"
}
}
Jq is like sed for JSON data - you can slice, filter, map, and transform structured data with the same ease that sed, awk, grep, and friends let you play with text.
from conductor.client.workflow.task.json_jq_task import JsonJQTask
jq_script = """
{ key3: (.key1.value1 + .key2.value2) }
"""
jq = JsonJQTask(task_ref_name='jq_process', script=jq_script)
{
"name": "json_transform_task",
"taskReferenceName": "json_transform_task_ref",
"type": "JSON_JQ_TRANSFORM",
"inputParameters": {
"key1": "k1",
"key2": "k2",
"queryExpression": "{ key3: (.key1.value1 + .key2.value2) }",
}
}
[!tip] Workers are a lightweight alternative to exposing an HTTP endpoint and orchestrating using HTTP tasks. Using workers is a recommended approach if you do not need to expose the service over HTTP or gRPC endpoints.
There are several advantages to this approach:
Conductor workers can run in the cloud-native environment or on-prem and can easily be deployed like any other Python application. Workers can run a containerized environment, VMs, or bare metal like you would deploy your other Python applications.
Workflow can be defined as the collection of tasks and operators that specify the order and execution of the defined tasks. This orchestration occurs in a hybrid ecosystem that encircles serverless functions, microservices, and monolithic applications.
This section will dive deeper into creating and executing Conductor workflows using Python SDK.
Conductor lets you create the workflows using either Python or JSON as the configuration.
Using Python as code to define and execute workflows lets you build extremely powerful, dynamic workflows and run them on Conductor.
When the workflows are relatively static, they can be designed using the Orkes UI (available when using Orkes Conductor) and APIs or SDKs to register and run the workflows.
Both the code and configuration approaches are equally powerful and similar in nature to how you treat Infrastructure as Code.
For cases where the workflows cannot be created statically ahead of time, Conductor is a powerful dynamic workflow execution platform that lets you create very complex workflows in code and execute them. It is useful when the workflow is unique for each execution.
from conductor.client.automator.task_handler import TaskHandler
from conductor.client.configuration.configuration import Configuration
from conductor.client.orkes_clients import OrkesClients
from conductor.client.worker.worker_task import worker_task
from conductor.client.workflow.conductor_workflow import ConductorWorkflow
#@worker_task annotation denotes that this is a worker
@worker_task(task_definition_name='get_user_email')
def get_user_email(userid: str) -> str:
return f'{userid}@example.com'
#@worker_task annotation denotes that this is a worker
@worker_task(task_definition_name='send_email')
def send_email(email: str, subject: str, body: str):
print(f'sending email to {email} with subject {subject} and body {body}')
def main():
# defaults to reading the configuration using following env variables
# CONDUCTOR_SERVER_URL : conductor server e.g. https://play.orkes.io/api
# CONDUCTOR_AUTH_KEY : API Authentication Key
# CONDUCTOR_AUTH_SECRET: API Auth Secret
api_config = Configuration()
task_handler = TaskHandler(configuration=api_config)
#Start Polling
task_handler.start_processes()
clients = OrkesClients(configuration=api_config)
workflow_executor = clients.get_workflow_executor()
workflow = ConductorWorkflow(name='dynamic_workflow', version=1, executor=workflow_executor)
get_email = get_user_email(task_ref_name='get_user_email_ref', userid=workflow.input('userid'))
sendmail = send_email(task_ref_name='send_email_ref', email=get_email.output('result'), subject='Hello from Orkes',
body='Test Email')
#Order of task execution
workflow >> get_email >> sendmail
# Configure the output of the workflow
workflow.output_parameters(output_parameters={
'email': get_email.output('result')
})
#Run the workflow
result = workflow.execute(workflow_input={'userid': 'user_a'})
print(f'\nworkflow output: {result.output}\n')
#Stop Polling
task_handler.stop_processes()
if __name__ == '__main__':
main()
>> python3 dynamic_workflow.py
2024-02-03 19:54:35,700 [32853] conductor.client.automator.task_handler INFO created worker with name=get_user_email and domain=None
2024-02-03 19:54:35,781 [32853] conductor.client.automator.task_handler INFO created worker with name=send_email and domain=None
2024-02-03 19:54:35,859 [32853] conductor.client.automator.task_handler INFO TaskHandler initialized
2024-02-03 19:54:35,859 [32853] conductor.client.automator.task_handler INFO Starting worker processes...
2024-02-03 19:54:35,861 [32853] conductor.client.automator.task_runner INFO Polling task get_user_email with domain None with polling interval 0.1
2024-02-03 19:54:35,861 [32853] conductor.client.automator.task_handler INFO Started 2 TaskRunner process
2024-02-03 19:54:35,862 [32853] conductor.client.automator.task_handler INFO Started all processes
2024-02-03 19:54:35,862 [32853] conductor.client.automator.task_runner INFO Polling task send_email with domain None with polling interval 0.1
sending email to user_a@example.com with subject Hello from Orkes and body Test Email
workflow output: {'email': 'user_a@example.com'}
2024-02-03 19:54:36,309 [32853] conductor.client.automator.task_handler INFO Stopped worker processes...
See dynamic_workflow.py for a fully functional example.
For a more complex workflow example with all the supported features, see kitchensink.py.
The WorkflowClient interface provides all the APIs required to work with workflow executions.
from conductor.client.configuration.configuration import Configuration
from conductor.client.orkes_clients import OrkesClients
api_config = Configuration()
clients = OrkesClients(configuration=api_config)
workflow_client = clients.get_workflow_client()
Useful when workflows are long-running.
from conductor.client.http.models import StartWorkflowRequest
request = StartWorkflowRequest()
request.name = 'hello'
request.version = 1
request.input = {'name': 'Orkes'}
# workflow id is the unique execution id associated with this execution
workflow_id = workflow_client.start_workflow(request)
Applicable when workflows complete very quickly - usually under 20-30 seconds.
from conductor.client.http.models import StartWorkflowRequest
request = StartWorkflowRequest()
request.name = 'hello'
request.version = 1
request.input = {'name': 'Orkes'}
workflow_run = workflow_client.execute_workflow(
start_workflow_request=request,
wait_for_seconds=12)
[!note] See workflow_ops.py for a fully working application that demonstrates working with the workflow executions and sending signals to the workflow to manage its state.
Workflows represent the application state. With Conductor, you can query the workflow execution state anytime during its lifecycle. You can also send signals to the workflow that determines the outcome of the workflow state.
WorkflowClient is the client interface used to manage workflow executions.
from conductor.client.configuration.configuration import Configuration
from conductor.client.orkes_clients import OrkesClients
api_config = Configuration()
clients = OrkesClients(configuration=api_config)
workflow_client = clients.get_workflow_client()
The following method lets you query the status of the workflow execution given the id. When the include_tasks
is set, the response also includes all the completed and in-progress tasks.
get_workflow(workflow_id: str, include_tasks: Optional[bool] = True) -> Workflow
Variables inside a workflow are the equivalent of global variables in a program.
update_variables(self, workflow_id: str, variables: dict[str, object] = {})
Used to terminate a running workflow. Any pending tasks are canceled, and no further work is scheduled for this workflow upon termination. A failure workflow will be triggered but can be avoided if trigger_failure_workflow
is set to False.
terminate_workflow(self, workflow_id: str, reason: Optional[str] = None, trigger_failure_workflow: bool = False)
If the workflow has failed due to one of the task failures after exhausting the retries for the task, the workflow can still be resumed by calling the retry.
retry_workflow(self, workflow_id: str, resume_subworkflow_tasks: Optional[bool] = False)
When a sub-workflow inside a workflow has failed, there are two options:
resume_subworkflow_tasks
to True).A workflow in the terminal state (COMPLETED, TERMINATED, FAILED) can be restarted from the beginning. Useful when retrying from the last failed task is insufficient, and the whole workflow must be started again.
restart_workflow(self, workflow_id: str, use_latest_def: Optional[bool] = False)
In the cases where a workflow needs to be restarted from a specific task rather than from the beginning, rerun provides that option. When issuing the rerun command to the workflow, you can specify the task ID from where the workflow should be restarted (as opposed to from the beginning), and optionally, the workflow's input can also be changed.
rerun_workflow(self, workflow_id: str, rerun_workflow_request: RerunWorkflowRequest)
[!tip] Rerun is one of the most powerful features Conductor has, giving you unparalleled control over the workflow restart.
A running workflow can be put to a PAUSED status. A paused workflow lets the currently running tasks complete but does not schedule any new tasks until resumed.
pause_workflow(self, workflow_id: str)
Resume operation resumes the currently paused workflow, immediately evaluating its state and scheduling the next set of tasks.
resume_workflow(self, workflow_id: str)
Workflow executions are retained until removed from the Conductor. This gives complete visibility into all the executions an application has - regardless of the number of executions. Conductor has a powerful search API that allows you to search for workflow executions.
search(self, start, size, free_text: str = '*', query: str = None) -> ScrollableSearchResultWorkflowSummary
Here are the supported fields for query:
Field | Description |
---|---|
status | The status of the workflow. |
correlationId | The ID to correlate the workflow execution to other executions. |
workflowType | The name of the workflow. |
version | The version of the workflow. |
startTime | The start time of the workflow is in milliseconds. |
Conductor lets you embrace failures rather than worry about the complexities introduced in the system to handle failures.
All the aspects of handling failures, retries, rate limits, etc., are driven by the configuration that can be updated in real time without re-deploying your application.
Each task in the Conductor workflow can be configured to handle failures with retries, along with the retry policy (linear, fixed, exponential backoff) and maximum number of retry attempts allowed.
See Error Handling for more details.
What happens when a task is operating on a critical resource that can only handle a few requests at a time? Tasks can be configured to have a fixed concurrency (X request at a time) or a rate (Y tasks/time window).
from conductor.client.configuration.configuration import Configuration
from conductor.client.http.models import TaskDef
from conductor.client.orkes_clients import OrkesClients
def main():
api_config = Configuration()
clients = OrkesClients(configuration=api_config)
metadata_client = clients.get_metadata_client()
task_def = TaskDef()
task_def.name = 'task_with_retries'
task_def.retry_count = 3
task_def.retry_logic = 'LINEAR_BACKOFF'
task_def.retry_delay_seconds = 1
# only allow 3 tasks at a time to be in the IN_PROGRESS status
task_def.concurrent_exec_limit = 3
# timeout the task if not polled within 60 seconds of scheduling
task_def.poll_timeout_seconds = 60
# timeout the task if the task does not COMPLETE in 2 minutes
task_def.timeout_seconds = 120
# for the long running tasks, timeout if the task does not get updated in COMPLETED or IN_PROGRESS status in
# 60 seconds after the last update
task_def.response_timeout_seconds = 60
# only allow 100 executions in a 10-second window! -- Note, this is complementary to concurrent_exec_limit
task_def.rate_limit_per_frequency = 100
task_def.rate_limit_frequency_in_seconds = 10
metadata_client.register_task_def(task_def=task_def)
{
"name": "task_with_retries",
"retryCount": 3,
"retryLogic": "LINEAR_BACKOFF",
"retryDelaySeconds": 1,
"backoffScaleFactor": 1,
"timeoutSeconds": 120,
"responseTimeoutSeconds": 60,
"pollTimeoutSeconds": 60,
"timeoutPolicy": "TIME_OUT_WF",
"concurrentExecLimit": 3,
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
}
POST /api/metadata/taskdef -d @task_def.json
See task_configure.py for a detailed working app.
Conductor SDKs are lightweight and can easily be added to your existing or new Python app. This section will dive deeper into integrating Conductor in your application.
Conductor Python SDKs are published on PyPi @ https://pypi.org/project/conductor-python/:
pip3 install conductor-python
Conductor SDK for Python provides a complete feature testing framework for your workflow-based applications. The framework works well with any testing framework you prefer without imposing any specific framework.
The Conductor server provides a test endpoint POST /api/workflow/test
that allows you to post a workflow along with the test execution data to evaluate the workflow.
The goal of the test framework is as follows:
Here are example assertions from the test:
...
test_request = WorkflowTestRequest(name=wf.name, version=wf.version,
task_ref_to_mock_output=task_ref_to_mock_output,
workflow_def=wf.to_workflow_def())
run = workflow_client.test_workflow(test_request=test_request)
print(f'completed the test run')
print(f'status: {run.status}')
self.assertEqual(run.status, 'COMPLETED')
...
[!note] Workflow workers are your regular Python functions and can be tested with any available testing framework.
See test_workflows.py for a fully functional example of how to test a moderately complex workflow with branches.
[!tip] Treat your workflow definitions just like your code. Suppose you are defining the workflows using UI. In that case, we recommend checking the JSON configuration into the version control and using your development workflow for CI/CD to promote the workflow definitions across various environments such as Dev, Test, and Prod.
Here is a recommended approach when defining workflows using JSON:
POST /api/metadata/*
endpoints or MetadataClient (from conductor.client.metadata_client import MetadataClient
) to register/update workflows as part of the deployment process.A powerful feature of Conductor is the ability to version workflows. You should increment the version of the workflow when there is a significant change to the definition. You can run multiple versions of the workflow at the same time. When starting a new workflow execution, use the version
field to specify which version to use. When omitted, the latest (highest-numbered) version is used.
FAQs
Netflix Conductor Python SDK
We found that conductor-python demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.