fast-myers-diff

A fast, minimal, memory-efficient diff algorithm on strings, arrays, and typed arrays.

1.0.0
Source
npm

Version published: 5 years ago

Maintainers: 1

Created: 5 years ago

Source

Fast-Myers-Diff

This is a fast, compact, memory efficient implementation of the O(ND) Myers diff algorithm. The core diff algorithm, including blank lines and comment, is only 123 lines. With LCS and patch features added, the total library is still less than 200 lines. Minified and including type definitions, the published library is less than 3KB.

This implementation improves on a naive implementation of Myers recursive algorithm in several ways:

By using circular buffers for k-line computations, we achieve bounds of O(min(N,M) + D) space and O(min(N,M) * D) time, where N and M are the lengths of the input sequences and D is the number of differences.
The original recursive algorithm is replaced by an iterative version with a minimal stack handling altered parameters for right-recursion. All other recursive calls are tail calls replaced with simple jumps (via break or continue). Huge inputs may blow the heap, but you'll never overflow the stack!
Allocation is minimized by pre-allocating buffer space to be re-used by each simulated recursive call, and tracking indices into the original inputs. The core diff algorithm performs no slice operations or other copying of data.

Because the core algorithm does not slice or copy data, it depends only on being able to compare elements of the inputs at arbitrary indices. Thus, it automatically operates equally well on any indexable type--strings, basic arrays, or any flavor of typed array. Additionally, the library permits optimizing total application memory usage by producing output in the form of generators, rather than forcing you to accumulate the full output up-front.

The library exports the following interface:

type GenericIndexable = {
    [key: number]: unknown;
    readonly length: number;
};
type Indexable = string | unknown[] | TypedArray | GenericIndexable;
type Sliceable = TypedArray | (GenericIndexable & {
    slice(start: number, end?: number): Indexable;
});

function diff<T extends Indexable>(xs: T, ys: T): Generator<[number, number, number, number]>;
function lcs<T extends Indexable>(xs: T, ys: T): Generator<[number, number, number]>;

function calcPatch<T extends Sliceable>(xs: T, ys: T): Generator<[number, number, T]>;
function applyPatch<T extends Sliceable>(xs: T, patch: Iterable<[number, number, T]>): Generator<T>;

diff(xs, ys) is the core of the library; given two indexable sequences, xs and ys, it produces a sequence of quadruples [sx, ex, sy, ey], where [sx, ex) indicates a range to delete from xs and [sy, ey) indicates a range from ys to replace the deleted material with. Simple deletions are indicated when sy === ey and simple insertions when sx === ex.

lcs(xs, ys) uses the same underlying diff implementation, but instead produces output in the form of triples [sx, sy, l], where sx and sy are the starting idices in xs and ys respectively of an aligned common substring, and l is the length of said substring. Indexing into the original input sequences can be used to retrieve the actual LCS from this information.

calcPatch(xs, ys) is a thin wrapper over diff(xs, ys) which replaces the [sy, ey) indices with the relevant slice of ys. This can be used to reconstitute ys given xs. Once again, pure insertions are indicated when sx === ex, but pure deletions are indicated by an empty slice--i.e., an empty string, a zero-length array, etc. The insert slices are of the same type as the original ys. If ys is a string or an array, they are produced with the slice methods of strings or arrays, which will result in a shallow copy. If ys is a typed array, slices will be produced with TypedArray.prototype.subarray, which re-uses the existing underlying memory.

applyPatch(xs, patch) takes the output of calcPatch(xs, ys) and uses it to reconstitute the original elements of ys. The output is not, however, a single reconstituted Indexable, but a sequence of chunks taken alternately from xs and from the patch data. This is done for two reasons:

It avoids special-case code for joining each possible Indexable type;
As with all of the other library functions, it permits stream processing without deciding for you to allocate enough memory to hold the entire result at once.

This diff(xs, ys) and lcs(xs, ys) will also work with custom container types, as long as your container objects have a numeric length property. calcPatch(xs, ys) and 'applyPatch(xs, ys)will also with with custom types, provided that they also implement a suitableslice(start[, end])` method.

Keywords

FAQs

What is fast-myers-diff?

Is fast-myers-diff well maintained?

Package last updated on 02 Apr 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

fast-myers-diff

Fast-Myers-Diff

Keywords

Related posts

PyPI’s New Archival Feature Closes a Major Security Gap

North Korean APT Lazarus Targets Developers with Malicious npm Package