
parallex
System Requirements
Python >= 3.8
install
Default object store
pip install tx-parallex
Plasma store https://arrow.apache.org/
pip install tx-parallex[arrow]
Install from source
- Clone the repo
- Easy install instructions:
# Create a virtual environment called 'px'
conda create -n px python=3.8
# start-up the environment you just created
conda activate px
# install the rest of the tx-parallex pre-requirements
pip install -r requirements.txt
# run the tests, a number of test 'specs'
PYTHONPATH=src pytest -x -vv --full-trace -s --timeout 60
# deactivate the environment (if desired)
conda deactivate
set log level
set environment variable LOG_LEVEL to one of Python's logging library setLevel.
Introduction
A queue with dependencies
Usage
from tx.parallex import run_python
ret = run_python(number_of_workers = 4, pyf = "spec.py", dataf = "data.yml")
Spec
tx-parallex specs can be written in YAML or a Python-like DSL. The Python-like DSL is translated to YAML by tx-parallex. Each object in a spec specifies a task. When the task is executed, it is given a dict called data. The pipeline will return a dictionary.
YAML
Assuming you have a function sqr defined in module math which returns the square of its argument.
def sqr(x):
return x * x
let
The let task sets data for its subtask. It adds a new var value pair into data within the scope of its subtask, and executes that task.
Syntax:
type: let
var: <var>
obj: <value>
sub: <subtask>
Example:
type: let
var: a
obj:
data: 1
sub:
type: python
name: y
mod: math
func: sqr
params:
x:
name: a
map
The map task reads a list coll from data and applies a subtask to each member of the list. The members will be assigned to var in data passed to those tasks
Syntax:
type: map
coll: <value>
var: <variable name>
sub: <subtask>
<value> is an object of the form:
Reference an entry in data or the name of a task
"name": <variable name>
Constant
"data": <constant>
Example:
type: map
coll:
data:
- 1
- 2
- 3
var: a
sub:
type: python
name: y
mod: math
func: sqr
params:
x:
name: a
cond
The cond task reads a boolean value and if it is true then it executes the then task otherwise it executes the else task.
Syntax:
type: cond
on: <value>
then: <subtask>
else: <subtask>
Example:
type: cond
on:
data:
true
then:
type: ret
obj:
data: 1
else:
type: ret
obj:
data: 0
python
You can use any Python module.
The python task runs a Python function. It reads parameters from data. The return value must be pickleable.
Syntax:
type: python
name: <name>
mod: <module>
func: <function>
params: <parameters>
<parameters> is an object of the form:
<param> : <value>
...
<param> : <value>
where <param> can be either name or position.
Example:
type: python
name: y
mod: math
func: sqr
params:
x:
data: 1
top
The top task toplogically sorts subtasks based on their dependencies and ensure the tasks are executed in parallel in the order compatible with those dependencies.
Syntax:
type: top
sub: <subtasks>
It reads the name properties of subtasks that are not in data.
Example:
type: top
sub:
- type: python
name: y
mod: math
func: sqr
params:
x:
data: 1
- type: python
name: z
mod: math
func: sqr
params:
x:
name: y
seq
The seq task forces all subtasks to be run sequentially.
Syntax:
type: top
sub: <subtasks>
It reads the name properties of subtasks that are not in data.
Example:
type: seq
sub:
- type: python
name: y
mod: math
func: sqr
params:
x:
data: 1
- type: python
name: z
mod: math
func: sqr
params:
x:
name: y
ret
ret specify a value. The pipeline will return a dictionary. When a task appears under a map task, it is prefix with the index of the element in that collection as following
<index>
For nested maps, the indices will be chained together as followings
<index>. ... .<index>
Syntax:
type: ret
obj: <value>
Example:
type: ret
obj:
name: z
Python
A dsl block contains a subset of Python.
- There is a semantic difference from python. Any assignment in block is not visiable outside of the block.
- Assignment within a block are unordered
- return statement
Available syntax:
import
from <module> import *
from <module> import <func>, ..., <func>
import names from module
<module> absolute module names
assignment
<var> = <const>
where
<const> = <integer> | <number> | <boolean> | <string> | <list> | <dict>
This translates to let.
Example:
a = 1
y = sqr(x=a)
yield y
function application
<var> = [<module>.]<func>(<param>=<expr>, ...) | <expr>
This translate to python.
where <var> is name
<expr> is
<expr> = <expr> if <expr> else <expr> | <expr> <binop> <expr> | <expr> <boolop> <expr> | <expr> <compare> <expr> | <unaryop> <expr> | <var> | <const>
<binop>, <boolop> and <compare> and <unaryop> are python BinOp, BoolOp, Compare, and UnaryOp. <expr> is translated to a set of assignments, name, or data depending on its content.
Example:
y = math.sqr(1)
z = math.sqr(y)
return z
parallel for
for <var> in <expr>:
...
This translates to map.
Example:
for a in [1, 2, 3]:
y = math.sqr(a)
yield y
if
if <expr>:
...
else:
...
This translates to cond.
Example:
if z:
yield 1
else:
yield 0
The semantics of if is different from python, variables inside if is not visible outside
with
with Seq:
...
This translates to seq.
Example:
with Seq:
y = math.sqr(1)
return y
yield
yield <expr>
This translates to ret.
Example:
y = math.sqr(1)
return y
Data
data can be arbitrary yaml