PyPMML-Spark
PyPMML-Spark is a Python PMML scoring library for PySpark as SparkML Transformer, it really is the Python API for PMML4S-Spark.
Prerequisites
- Java >= 1.8
- Python 2.7 or >= 3.5
Dependencies
Installation
pip install pypmml-spark
Or install the latest version from github:
pip install --upgrade git+https://github.com/autodeployai/pypmml-spark.git
After that, you need to do more to use it in Spark that must know those jars in the package pypmml_spark.jars
. There are several ways to do that:
-
The easiest way is to run the script link_pmml4s_jars_into_spark.py
that is delivered with pypmml-spark
:
link_pmml4s_jars_into_spark.py
-
Use those config options to specify dependent jars properly. e.g. --jars
, or spark.executor.extraClassPath
and spark.executor.extraClassPath
. See Spark for details about those parameters.
Usage
-
Load model from various sources, e.g. filename, string, or array of bytes.
from pypmml_spark import ScoreModel
model = ScoreModel.fromFile('single_iris_dectree.xml')
-
Call transform(dataset)
to run a batch score against an input dataset.
df = spark.read.csv('Iris.csv', header='true')
score_df = model.transform(df)
Use PMML in Scala or Java
See the PMML4S project. PMML4S is a PMML scoring library for Scala. It provides both Scala and Java Evaluator API for PMML.
Use PMML in Python
See the PyPMML project. PyPMML is a Python PMML scoring library, it really is the Python API for PMML4S.
Use PMML in Spark
See the PMML4S-Spark project. PMML4S-Spark is a PMML scoring library for Spark as SparkML Transformer.
Deploy PMML as REST API
See the AI-Serving project. AI-Serving is serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints.
Support
If you have any questions about the PyPMML-Spark library, please open issues on this repository.
Feedback and contributions to the project, no matter what kind, are always very welcome.
License
PyPMML-Spark is licensed under APL 2.0.