Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
fast-diff-match-patch
Advanced tools
Packages the C++ implementation of google-diff-match-patch for Python for fast byte and string diffs.
This is a Python 3.6+ package that wraps google-diff-match-patch's C++ implementation for performing very fast string comparisons. This package was previously known as diff_match_patch_python.
google-diff-match-patch is a Google library for computing differences between text files (http://code.google.com/p/google-diff-match-patch). There are implementations in various languages. Although there is a Python port, it's slow on very large documents, and I have a need for speed. I wanted to use the C++ implementation, but I'm a Python guy so I'd prefer to use it from Python.
Google's library depends on Qt 4, so some other folks rewrote it using the standard C++ library classes instead, making it more portable. That's at https://github.com/leutloff/diff-match-patch-cpp-stl. This package uses that library.
First:
pip3 install fast_diff_match_patch
Then write (this is Python 3):
from fast_diff_match_patch import diff
changes = diff("Hello world.", "Goodbye moon.")
for op, length in changes:
if op == "-": print ("next", length, "characters are deleted")
if op == "=": print ("next", length, "characters are in common")
if op == "+": print ("next", length, "characters are inserted")
The two textual arguments can be either strings or bytes.
Some keyword arguments are also available:
timelimit
(default 0) gives the maximum running time in seconds if you
want to ensure the result comes quickly. According to the Google docs,
the diff will stop working after the time is exceeded and will return a
valid diff, but it might not be the best one. checklines
is also a
Google thing and might speed up diffs that are over lined-based text
like code.
checklines
(default True
) is the same argument in the diff_main
subroutine of the main library.
cleanup
(default "Semantic"
) is "Semantic"
, "Efficiency"
, or "No"
to run the corresponding cleanup subroutine after performing the diff.
Set counts_only
(default True
) to False
to have the returned value be an array of
tuples of operations and corresponding strings rather than operations
and the lengths of those strings.
If as_patch
(default False
) is True
, the diff is returned in patch format
as a string.
The Global Interpreter Lock (GIL) is released while performing the diff so that this library can be used in a multi-threaded application.
as_patch
argument.diff_match_patch
to fast_diff_match_patch
to avoid an import naming collision with https://pypi.org/project/diff-match-patch/ and the package name has been updated to match the import name.diff_bytes
(Py3), diff_unicode
and diff_str
(Py2)
methods were available. They have been merged into a single diff
method that checks the type of the arguments passed.)cleanup_semantic
has been renamed to cleanup
, which takes one of three options (see above)To build from these sources, you will need:
python3-dev
, python3-setuptools
)git submodule update --init
.Then build/install the binary module using:
python setup.py build
python setup.py install
To build everything (for testing):
git submodule update && rm -rf build && python3 setup.py build
To test without installing:
PYTHONPATH=build/lib.linux-x86_64-*/ python3 -m unittest
Release packages (wheels and a source distribution) are built using GitHub Actions in this repository. To upload them as a new release to PyPi, download the artifact and extract the files to a new directory, and:
python3 -m pip install --upgrade twine
python3 -m twine upload -u __token__ path-to-artifact-files/*
FAQs
Packages the C++ implementation of google-diff-match-patch for Python for fast byte and string diffs.
We found that fast-diff-match-patch demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.