PyStarburst DataFrame API
PyStarburst DataFrame API allows you to query and transform data in Starburst products in a data pipeline without having to download the data locally.
Documentation
See the PyStarburst API documentation and the examples repository.
Getting started
Install pystarburst
pip install pystarburst
Connect to a Starburst server
The parameters are the same connect parameters as in Trino Python Client.
from pystarburst import Session
connection_parameters = {
"host": "localhost",
"port": 8080,
"user": "admin",
"catalog": "tpch",
"schema": "tiny"
}
session = Session.builder.configs(connection_parameters).create()
Using SQL
from pystarburst import Session
session = Session.builder.configs({ ... }).create()
session.sql("SELECT 1 as a").show()
Querying a table
from pystarburst import Session
session = Session.builder.configs({ ... }).create()
df = session.table("nation")
print(df.schema)
df.show()
Filtering a data frame
from pystarburst import Session
session = Session.builder.configs({ ... }).create()
df = session.table("nation")
df.filter(df.col("regionkey") == 0).show()
Joining data frames
from pystarburst import Session
session = Session.builder.configs({ ... }).create()
df = session.table("nation")
df.filter(df.col("regionkey") == 0).show()
Aggregation
from pystarburst import Session
from pystarburst.functions import col
session = Session.builder.configs({ ... }).create()
df = session.table("nation")
df.agg((col("regionkey"), "max"), (col("regionkey"), "avg")).show()