
Research
2025 Report: Destructive Malware in Open Source Packages
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.
reladiff
Advanced tools
Command-line tool and Python library to efficiently diff rows across two different databases.
Reladiff is a high-performance tool and library designed for diffing large datasets across databases. By executing the diff calculation within the database itself, Reladiff minimizes data transfer and achieves optimal performance.
This tool is specifically tailored for data professionals, DevOps engineers, and system administrators.
Reladiff is free, open-source, user-friendly, extensively tested, and delivers fast results, even at massive scale.
Cross-Database Diff: Reladiff employs a divide-and-conquer algorithm, based on matching hashes, to efficiently identify modified segments and download only the necessary data for comparison. This approach ensures exceptional performance when differences are minimal.
⇄ Diffs across over a dozen different databases (e.g. PostgreSQL -> Snowflake) !
🧠 Gracefully handles reduced precision (e.g., timestamp(9) -> timestamp(3)) by rounding according to the database specification.
🔥 Benchmarked to diff over 25M rows in under 10 seconds and over 1B rows in approximately 5 minutes, given no differences.
♾️ Capable of handling tables with tens of billions of rows.
Intra-Database Diff: When both tables reside in the same database, Reladiff compares them using a join operation, with additional optimizations for enhanced speed.
Threaded: Utilizes multiple threads to significantly boost performance during diffing operations.
Configurable: Offers numerous options for power-users to customize and optimize their usage.
Automation-Friendly: Outputs both JSON and git-like diffs (with + and -), facilitating easy integration into CI/CD pipelines.
Over a dozen databases supported. MySQL, Postgres, Snowflake, Bigquery, Oracle, Clickhouse, and more. See full list
Reladiff is a fork of an archived project called data-diff.
🗎 Read the Documentation - our detailed documentation has everything you need to start diffing.
For the impatient ;)
Reladiff is available on PyPI. You may install it by running:
pip install reladiff
Requires Python 3.8+ with pip.
We advise to install it within a virtual-env.
Once you've installed Reladiff, you can run it from the command-line:
# Cross-DB diff, using hashes
reladiff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
When both tables belong to the same database, a shorter syntax is available:
# Same-DB diff, using outer join
reladiff DB1_URI TABLE1_NAME TABLE2_NAME [OPTIONS]
Or, you can import and run it from Python:
from reladiff import connect_to_table, diff_tables
table1 = connect_to_table("postgresql:///", "table_name", "id")
table2 = connect_to_table("mysql:///", "table_name", "id")
sign: Literal['+' | '-']
row: tuple[str, ...]
for sign, row in diff_tables(table1, table2):
print(sign, row)
Read our detailed instructions:
reladiff \
postgresql:/// \
events \
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
events \
-k event_id \ # Identifier of event
-c event_data \ # Extra column to compare
-w "event_time < '2024-10-10'" # Filter the rows on both dbs
Materializes the results into a new table, containing the current timestamp in its name.
reladiff \
postgresql:/// events old_events \
-k org_id \
-c created_at -c is_internal \
-w "org_id != 1 and org_id < 2000" \
-m test_results_%t \
--materialize-all-rows \
--table-write-limit 10000
Check out this technical explanation of how cross-database reladiff works.
Confused? Got a cool idea? Just want to share your thoughts? Let's discuss it in GitHub Discussions.
Did you encounter a bug? Open an issue.
Big thanks to everyone who contributed so far:
This project is licensed under the terms of the MIT License.
FAQs
Command-line tool and Python library to efficiently diff rows across two different databases.
We found that reladiff demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.

Research
/Security News
A five-month operation turned 27 npm packages into durable hosting for browser-run lures that mimic document-sharing portals and Microsoft sign-in, targeting 25 organizations across manufacturing, industrial automation, plastics, and healthcare for credential theft.