Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
You will find pl-compare useful if you find yourself writing various SQL/Dataframe operations to:
Click for a jupyter notebook with example usage
With pl-compare you can:
pip install pl_compare
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )
>>>
>>> compare_result = compare(["ID"], base_df, compare_df)
>>> print("is_schemas_equal:", compare_result.is_schemas_equal())
is_schemas_equal: False
>>> print("is_rows_equal:", compare_result.is_rows_equal())
is_rows_equal: False
>>> print("is_values_equal:", compare_result.is_values_equal())
is_values_equal: False
>>>
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )
>>>
>>> compare_result = compare(["ID"], base_df, compare_df)
>>> print("schemas_summary()")
schemas_summary()
>>> print(compare_result.schemas_summary())
shape: (6, 2)
┌─────────────────────────────────┬───────┐
│ Statistic ┆ Count │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════════════════════════════╪═══════╡
│ Columns in base ┆ 3 │
│ Columns in compare ┆ 4 │
│ Columns in base and compare ┆ 3 │
│ Columns only in base ┆ 0 │
│ Columns only in compare ┆ 1 │
│ Columns with schema difference... ┆ 1 │
└─────────────────────────────────┴───────┘
>>> print("schemas_sample()")
schemas_sample()
>>> print(compare_result.schemas_sample())
shape: (2, 3)
┌──────────┬─────────────┬────────────────┐
│ column ┆ base_format ┆ compare_format │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪═════════════╪════════════════╡
│ Example2 ┆ String ┆ Int64 │
│ Example3 ┆ null ┆ Int64 │
└──────────┴─────────────┴────────────────┘
>>>
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )
>>>
>>> compare_result = compare(["ID"], base_df, compare_df)
>>> print("rows_summary()")
rows_summary()
>>> print(compare_result.rows_summary())
shape: (5, 2)
┌──────────────────────────┬───────┐
│ Statistic ┆ Count │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════════════════════╪═══════╡
│ Rows in base ┆ 3 │
│ Rows in compare ┆ 3 │
│ Rows only in base ┆ 1 │
│ Rows only in compare ┆ 1 │
│ Rows in base and compare ┆ 2 │
└──────────────────────────┴───────┘
>>> print("rows_sample()")
rows_sample()
>>> print(compare_result.rows_sample())
shape: (2, 3)
┌────────────┬──────────┬─────────────────┐
│ ID ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════════╪══════════╪═════════════════╡
│ 12345678 ┆ status ┆ in base only │
│ 1234567810 ┆ status ┆ in compare only │
└────────────┴──────────┴─────────────────┘
>>>
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )
>>>
>>> compare_result = compare(["ID"], base_df, compare_df)
>>> print("values_summary()")
values_summary()
>>> print(compare_result.values_summary())
shape: (2, 3)
┌─────────────────────────┬───────┬────────────┐
│ Value Differences ┆ Count ┆ Percentage │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════════════════════════╪═══════╪════════════╡
│ Total Value Differences ┆ 1 ┆ 50.0 │
│ Example1 ┆ 1 ┆ 50.0 │
└─────────────────────────┴───────┴────────────┘
>>> print("values_sample()")
values_sample()
>>> print(compare_result.values_sample())
shape: (1, 4)
┌─────────┬──────────┬──────┬─────────┐
│ ID ┆ variable ┆ base ┆ compare │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════════╪══════════╪══════╪═════════╡
│ 1234567 ┆ Example1 ┆ 6 ┆ 2 │
└─────────┴──────────┴──────┴─────────┘
>>>
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )
>>>
>>> compare_result = compare(["ID"], base_df, compare_df)
>>> compare_result.report()
--------------------------------------------------------------------------------
COMPARISON REPORT
--------------------------------------------------------------------------------
<BLANKLINE>
SCHEMA DIFFERENCES:
shape: (6, 2)
┌─────────────────────────────────┬───────┐
│ Statistic ┆ Count │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════════════════════════════╪═══════╡
│ Columns in base ┆ 3 │
│ Columns in compare ┆ 4 │
│ Columns in base and compare ┆ 3 │
│ Columns only in base ┆ 0 │
│ Columns only in compare ┆ 1 │
│ Columns with schema difference... ┆ 1 │
└─────────────────────────────────┴───────┘
shape: (2, 3)
┌──────────┬─────────────┬────────────────┐
│ column ┆ base_format ┆ compare_format │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪═════════════╪════════════════╡
│ Example2 ┆ String ┆ Int64 │
│ Example3 ┆ null ┆ Int64 │
└──────────┴─────────────┴────────────────┘
--------------------------------------------------------------------------------
<BLANKLINE>
ROW DIFFERENCES:
shape: (5, 2)
┌──────────────────────────┬───────┐
│ Statistic ┆ Count │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════════════════════╪═══════╡
│ Rows in base ┆ 3 │
│ Rows in compare ┆ 3 │
│ Rows only in base ┆ 1 │
│ Rows only in compare ┆ 1 │
│ Rows in base and compare ┆ 2 │
└──────────────────────────┴───────┘
shape: (2, 3)
┌────────────┬──────────┬─────────────────┐
│ ID ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════════╪══════════╪═════════════════╡
│ 12345678 ┆ status ┆ in base only │
│ 1234567810 ┆ status ┆ in compare only │
└────────────┴──────────┴─────────────────┘
--------------------------------------------------------------------------------
<BLANKLINE>
VALUE DIFFERENCES:
shape: (2, 3)
┌─────────────────────────┬───────┬────────────┐
│ Value Differences ┆ Count ┆ Percentage │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════════════════════════╪═══════╪════════════╡
│ Total Value Differences ┆ 1 ┆ 50.0 │
│ Example1 ┆ 1 ┆ 50.0 │
└─────────────────────────┴───────┴────────────┘
shape: (1, 4)
┌─────────┬──────────┬──────┬─────────┐
│ ID ┆ variable ┆ base ┆ compare │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════════╪══════════╪══════╪═════════╡
│ 1234567 ┆ Example1 ┆ 6 ┆ 2 │
└─────────┴──────────┴──────┴─────────┘
--------------------------------------------------------------------------------
End of Report
--------------------------------------------------------------------------------
>>>
>>> import polars as pl
>>> import pandas as pd # doctest: +SKIP
>>> from pl_compare import compare
>>>
>>> base_df = pd.DataFrame(data=
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )# doctest: +SKIP
>>> compare_df = pd.DataFrame(data=
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )# doctest: +SKIP
>>>
>>> compare_result = compare(["ID"], pl.from_pandas(base_df), pl.from_pandas(compare_df))# doctest: +SKIP
>>> compare_result.report()# doctest: +SKIP
--------------------------------------------------------------------------------
COMPARISON REPORT
--------------------------------------------------------------------------------
SCHEMA DIFFERENCES:
shape: (6, 2)
┌─────────────────────────────────┬───────┐
│ Statistic ┆ Count │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════════════════════════════╪═══════╡
│ Columns in base ┆ 3 │
│ Columns in compare ┆ 4 │
│ Columns in base and compare ┆ 3 │
│ Columns only in base ┆ 0 │
│ Columns only in compare ┆ 1 │
│ Columns with schema differences ┆ 1 │
└─────────────────────────────────┴───────┘
shape: (2, 3)
┌──────────┬─────────────┬────────────────┐
│ column ┆ base_format ┆ compare_format │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪═════════════╪════════════════╡
│ Example2 ┆ String ┆ Int64 │
│ Example3 ┆ null ┆ Int64 │
└──────────┴─────────────┴────────────────┘
--------------------------------------------------------------------------------
ROW DIFFERENCES:
shape: (5, 2)
┌──────────────────────────┬───────┐
│ Statistic ┆ Count │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════════════════════╪═══════╡
│ Rows in base ┆ 3 │
│ Rows in compare ┆ 3 │
│ Rows only in base ┆ 1 │
│ Rows only in compare ┆ 1 │
│ Rows in base and compare ┆ 2 │
└──────────────────────────┴───────┘
shape: (2, 3)
┌────────────┬──────────┬─────────────────┐
│ ID ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════════╪══════════╪═════════════════╡
│ 12345678 ┆ status ┆ in base only │
│ 1234567810 ┆ status ┆ in compare only │
└────────────┴──────────┴─────────────────┘
--------------------------------------------------------------------------------
VALUE DIFFERENCES:
shape: (2, 3)
┌─────────────────────────┬───────┬────────────┐
│ Value Differences ┆ Count ┆ Percentage │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════════════════════════╪═══════╪════════════╡
│ Total Value Differences ┆ 1 ┆ 50.0 │
│ Example1 ┆ 1 ┆ 50.0 │
└─────────────────────────┴───────┴────────────┘
shape: (1, 4)
┌─────────┬──────────┬──────┬─────────┐
│ ID ┆ variable ┆ base ┆ compare │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════════╪══════════╪══════╪═════════╡
│ 1234567 ┆ Example1 ┆ 6 ┆ 2 │
└─────────┴──────────┴──────┴─────────┘
--------------------------------------------------------------------------------
End of Report
--------------------------------------------------------------------------------
>>>
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1.111, 6.11, 3.11],
... }
... )
>>>
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1.114, 6.14, 3.12],
... },
... )
>>>
>>> print("With equality_resolution of 0.01")
With equality_resolution of 0.01
>>> compare_result = compare(["ID"], base_df, compare_df, resolution=0.01)
>>> print(compare_result.values_sample())
shape: (1, 4)
┌─────────┬──────────┬──────┬─────────┐
│ ID ┆ variable ┆ base ┆ compare │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 │
╞═════════╪══════════╪══════╪═════════╡
│ 1234567 ┆ Example1 ┆ 6.11 ┆ 6.14 │
└─────────┴──────────┴──────┴─────────┘
>>> print("With no equality_resolution")
With no equality_resolution
>>> compare_result = compare(["ID"], base_df, compare_df)
>>> print(compare_result.values_sample())
shape: (2, 4)
┌─────────┬──────────┬───────┬─────────┐
│ ID ┆ variable ┆ base ┆ compare │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 │
╞═════════╪══════════╪═══════╪═════════╡
│ 123456 ┆ Example1 ┆ 1.111 ┆ 1.114 │
│ 1234567 ┆ Example1 ┆ 6.11 ┆ 6.14 │
└─────────┴──────────┴───────┴─────────┘
>>>
>>> import polars as pl
>>> from pl_compare import compare
>>>
>>> base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": ["1", "2", "3"],
... }
... )
>>> compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "1234567810"],
... "Example1": [1, 2, 3],
... "Example2": [1, 2, 3],
... "Example3": [1, 2, 3],
... },
... )
>>>
>>> compare_result = compare(["ID"],
... base_df,
... compare_df,
... base_alias="before_change",
... compare_alias="after_change")
>>>
>>> print("values_summary()")
values_summary()
>>> print(compare_result.schemas_sample())
shape: (2, 3)
┌──────────┬──────────────────────┬─────────────────────┐
│ column ┆ before_change_format ┆ after_change_format │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════════════════════╪═════════════════════╡
│ Example2 ┆ String ┆ Int64 │
│ Example3 ┆ null ┆ Int64 │
└──────────┴──────────────────────┴─────────────────────┘
>>> print("values_sample()")
values_sample()
>>> print(compare_result.values_sample())
shape: (1, 4)
┌─────────┬──────────┬───────────────┬──────────────┐
│ ID ┆ variable ┆ before_change ┆ after_change │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════════╪══════════╪═══════════════╪══════════════╡
│ 1234567 ┆ Example1 ┆ 6 ┆ 2 │
└─────────┴──────────┴───────────────┴──────────────┘
>>>
>>> import polars as pl
>>> import pytest
>>> from pl_compare.compare import compare
>>>
>>> def test_example():
... base_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 3],
... "Example2": [1, 2, 3],
... }
... )
... compare_df = pl.DataFrame(
... {
... "ID": ["123456", "1234567", "12345678"],
... "Example1": [1, 6, 9],
... "Example2": [1, 2, 3],
... }
... )
... comparison = compare(["ID"], base_df, compare_df)
... if not comparison.is_equal():
... raise Exception(comparison.report())
...
>>> test_example() # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 18, in test_example
Exception: --------------------------------------------------------------------------------
COMPARISON REPORT
--------------------------------------------------------------------------------
No Schema differences found.
--------------------------------------------------------------------------------
No Row differences found (when joining by the supplied id_columns).
--------------------------------------------------------------------------------
VALUE DIFFERENCES:
shape: (3, 3)
┌─────────────────────────┬───────┬────────────┐
│ Value Differences ┆ Count ┆ Percentage │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════════════════════════╪═══════╪════════════╡
│ Total Value Differences ┆ 1 ┆ 16.666667 │
│ Example1 ┆ 1 ┆ 33.333333 │
│ Example2 ┆ 0 ┆ 0.0 │
└─────────────────────────┴───────┴────────────┘
shape: (1, 4)
┌──────────┬──────────┬──────┬─────────┐
│ ID ┆ variable ┆ base ┆ compare │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞══════════╪══════════╪══════╪═════════╡
│ 12345678 ┆ Example1 ┆ 3 ┆ 9 │
└──────────┴──────────┴──────┴─────────┘
--------------------------------------------------------------------------------
End of Report
--------------------------------------------------------------------------------
>>>
FAQs
A tool to find the differences between two tables.
We found that pl-compare demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.