lakeFS High-Level Python SDK
lakeFS High Level SDK for Python, provides developers with the following features:
- Simpler programming interface with less configuration
- Inferring identity from environment
- Better abstractions for common, more complex operations (I/O, transactions, imports)
Requirements
Python 3.9+
Installation & Usage
pip install
pip install lakefs
Import the package
import lakefs
Getting Started
Please follow the installation procedure and afterward refer to the following example snippet for a quick start:
import lakefs
from lakefs.client import Client
repo = lakefs.repository(repository_id="my-repo")
clt = Client(username="<lakefs_access_key_id>", password="<lakefs_secret_access_key>", host="<lakefs_endpoint>")
repo = lakefs.Repository(repository_id="my-repo", client=clt)
main_branch = repo.create(storage_namespace="<storage_namespace>").branch(branch_id="main")
...
Examples
Print sizes of all objects in lakefs://repo/main~2
ref = lakefs.Repository("repo").ref("main~2")
for obj in ref.objects():
print(f"{o.path}: {o.size_bytes}")
Difference between two branches
for i in lakefs.Repository("repo").ref("main").diff("twig"):
print(i)
You can also use the ref expressions here, for instance
.diff("main~2") also works. Ref expressions are the lakeFS analogues of
how Git specifies revisions.
Search a stored object for a string
with lakefs.Repository("repo").ref("main").object("path/to/data").reader(mode="r") as f:
for l in f:
if "quick" in l:
print(l)
Upload and commit some data
with lakefs.Repository("golden").branch("main").object("path/to/new").writer(mode="wb") as f:
f.write(b"my data")
lakefs.Repository("golden").branch("main").commit("added my data using lakeFS high-level SDK")
with lakefs.Repository("golden").branch("main").object("path/to/new").reader(mode="r") as f:
for l in f:
print(l)
Unlike references, branches are readable. This example couldn't work if we used a ref.
Tests
To run the tests using pytest, first clone the lakeFS git repository
git clone https://github.com/treeverse/lakeFS.git
cd lakefs/clients/python-wrapper
Unit Tests
Inside the tests folder, execute pytest utests to run the unit tests.
Integration Tests
See testing documentation for more information
Documentation
lakeFS Python SDK
Author
services@treeverse.io