Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
pytest-spark ############
.. image:: https://travis-ci.org/malexer/pytest-spark.svg?branch=master :target: https://travis-ci.org/malexer/pytest-spark
pytest_ plugin to run the tests with support of pyspark (Apache Spark
_).
This plugin will allow to specify SPARK_HOME directory in pytest.ini
and thus to make "pyspark" importable in your tests which are executed
by pytest.
You can also define "spark_options" in pytest.ini
to customize pyspark,
including "spark.jars.packages" option which allows to load external
libraries (e.g. "com.databricks:spark-xml").
pytest-spark provides session scope fixtures spark_context
and
spark_session
which can be used in your tests.
Note: no need to define SPARK_HOME if you've installed pyspark using
pip (e.g. pip install pyspark
) - it should be already importable. In
this case just don't define SPARK_HOME neither in pytest
(pytest.ini / --spark_home) nor as environment variable.
.. code-block:: shell
$ pip install pytest-spark
To run tests with required spark_home location you need to define it by using one of the following methods:
Specify command line option "--spark_home"::
$ pytest --spark_home=/opt/spark
Add "spark_home" value to pytest.ini
in your project directory::
[pytest] spark_home = /opt/spark
Set the "SPARK_HOME" environment variable.
pytest-spark will try to import pyspark
from provided location.
.. note::
"spark_home" will be read in the specified order. i.e. you can
override pytest.ini
value by command line option.
Just define "spark_options" in your pytest.ini
, e.g.::
[pytest]
spark_home = /opt/spark
spark_options =
spark.app.name: my-pytest-spark-tests
spark.executor.instances: 1
spark.jars.packages: com.databricks:spark-xml_2.12:0.5.0
spark_context
fixtureUse fixture spark_context
in your tests as a regular pyspark fixture.
SparkContext instance will be created once and reused for the whole test
session.
Example::
def test_my_case(spark_context):
test_rdd = spark_context.parallelize([1, 2, 3, 4])
# ...
spark_session
fixture (Spark 2.0 and above)Use fixture spark_session
in your tests as a regular pyspark fixture.
A SparkSession instance with Hive support enabled will be created once and reused for the whole test
session.
Example::
def test_spark_session_dataframe(spark_session):
test_df = spark_session.createDataFrame([[1,3],[2,4]], "a: int, b: int")
# ...
spark_session
fixtureBy default spark_session
will be loaded with the following configurations :
Example::
{
'spark.app.name': 'pytest-spark',
'spark.default.parallelism': 1,
'spark.dynamicAllocation.enabled': 'false',
'spark.executor.cores': 1,
'spark.executor.instances': 1,
'spark.io.compression.codec': 'lz4',
'spark.rdd.compress': 'false',
'spark.sql.shuffle.partitions': 1,
'spark.shuffle.compress': 'false',
'spark.sql.catalogImplementation': 'hive',
}
You can override some of these parameters in your pytest.ini
.
For example, removing Hive Support for the spark session :
Example::
[pytest]
spark_home = /opt/spark
spark_options =
spark.sql.catalogImplementation: in-memory
Run tests locally::
$ docker-compose up --build
.. _pytest: http://pytest.org/ .. _Apache Spark: https://spark.apache.org/
FAQs
pytest plugin to run the tests with support of pyspark.
We found that pytest-spark demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.