Sesdiff python binding
Description
This is a python library that reads relates two strings and computes the
shortest edit script (Myers' diff algorithm) to go from the string in column A
to the string in column B. It also computes the edit distance (aka levenshtein
distance).
There is also a command line version available.
Installation
pip install sesdiff
Usage
The shortest_edit_script
function returns an EditScript
instance containing
the instructions needed to get from string A to string B. Instructions are
represented as two-tuples with the first string being a character representing
the edit instruction type (+
for insertion, -
for deletion, =
for
identity) and the second the actual textual content.
from sesdiff import shortest_edit_script, Mode
result = shortest_edit_script("hablaron","hablar")
assert result.distance() == 2
assert len(result) == 2
assert result[0] == ('=',"hablar")
assert result[1] == ('-',"on")
for instruction in result:
print(instruction)
result = shortest_edit_script("hablaron","hablar", Mode.SUFFIX)
assert result.distance() == 2
assert len(result) == 1
assert result[0] == ('-',"on")
result = shortest_edit_script("говорим","говорить")
assert result.distance() == 3
assert len(result) == 3
assert result[0] == ('=',"говори")
assert result[1] == ('-',"м")
assert result[2] == ('+',"ть")
Limitations
The apply functionality from the main library/CLI tool is not implemented yet.
Do not use this library if you're merely interested in computing levenshtein
distance, it comes with performance overhead to return the actual edit scripts.