GraphFrames Python Package

<img src=https://raw.githubusercontent.com/graphframes/graphframes/refs/heads/master/docs/img/GraphFrames-Logo-Large.png width=500>
https://graphframes.io/
The is the officila graphframes-py PyPI package, which is a Python wrapper for the Scala GraphFrames library.
This package is maintained by the GraphFrames project and is available on PyPI.
For instructions on GraphFrames, check the project README.md.
See Installation and Quick-Start for the best way to install and use GraphFrames.
Installation
pip install graphframes-py
NOTE! Python distribution does not include JVM-core. You need to add it to your cluster or Spark-Connect server!
Running graphframes-py
You should use GraphFrames via the --packages
argument to pyspark
or spark-submit
, but this package is helpful in development environments.
$ pyspark --packages io.graphframes:graphframes-spark3_2.12:0.9.1
$ pyspark --packages io.graphframes:graphframes-spark4_2.13:0.9.1
Documentation
Spark-Connect Note
GraphFrames PySpark is choosing connect or classic implementation implicitly based on the result of is_remote()
.
To enforce usage of connect-based implementation, you may export this variable SPARK_CONNECT_MODE_ENABLED=1