Pachyderm's Python SDK
Official Python client/SDK for Pachyderm.
The successor to https://github.com/pachyderm/python-pachyderm.
This library provides the autogenerated gRPC/protobuf code for Pachyderm,
generated using a fork of the betterproto package,
along with higher-level functionality.
Installation
pip install pachyderm_sdk
A Small Taste
Here's an example that creates a repo and adds a file:
from pachyderm_sdk import Client
from pachyderm_sdk.api import pfs
client = Client.from_config()
repo = pfs.Repo(name="test")
client.pfs.create_repo(repo=repo)
branch = pfs.Branch.from_uri("test@master")
with client.pfs.commit(branch=branch) as commit:
file = commit.put_file_from_bytes(path="/data/file.dat", data=b"DATA")
with client.pfs.pfs_file(file) as f:
print(f.readall())
How to load a CAST file into a pandas dataframe
from pachyderm_sdk import Client
from pachyderm_sdk.api import pfs
import pandas as pd
client = Client.from_config()
file = pfs.File.from_uri("test@master:/path/to/data.csv")
with client.pfs.pfs_file(file) as f:
df = pd.read_csv(f)
Changes from Python-Pachyderm
This package is a successor to the python-pachyderm package.
Listed below are some of the notable changes:
- Organization of the API
- Methods and Message objects are now organized according to the
service they are associated with, i.e. auth, pfs (pachyderm file-system),
pps (pachyderm pipelining-system).
- Message objects can be found within their respective submodule of the
pachyder_sdk.api
module, i.e. pachyderm_sdk.api.pfs
. - Methods can be found within their respective attribute of the
Client
class, i.e. client.pps.create_pipeline
.
- Some methods have been renamed to remove redundancy due to this organization, i.e.
python_pachyderm.Client.get_enterprise_state
-> pachyderm_sdk.Client.enterprise.get_state
- The autogenerated code is generated using a fork of the betterproto compiler.
- Messages are now python dataclasses.
- Methods require keyword arguments.
- Pachyderm resources are specified using types.
- python-pachyderm (old):
client.create_repo("test")
- pachyderm_sdk (new):
client.pfs.create_repo(repo=pfs.Repo(name="test"))
Contributing
Please see the contributing guide for more info (including testing instructions)
Developer Guide
Generate python APIs from protobuf:
./generate-protos.sh
Generate HTML documentation (writes to docs/pachyderm_sdk):
make docs
Running Tests:
pytest -vvv tests