Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
fast-myers-diff
Advanced tools
A fast, minimal, memory-efficient diff algorithm on strings, arrays, and typed arrays.
This is a fast, compact, memory efficient implementation of the O(ND) Myers diff algorithm. Minified and including type definitions, the published library is less than 4KB.
This implementation improves on a naive implementation of Myers recursive algorithm in several ways:
break
or continue
). Huge inputs may blow the heap, but you'll never overflow the stack!Because the core algorithm does not slice or copy data, it depends only on being able to compare elements of the inputs at arbitrary indices. Thus, it automatically operates equally well on any indexable type--strings, basic arrays, or any flavor of typed array. Additionally, the library permits optimizing total application memory usage by producing output in the form of generators, rather than forcing you to accumulate the full output up-front.
'words'
, 'chars'
or 'line'
compare modes as well as custom regular expressions.All three libraries have the ability to compute character differences between strings.
The library exports the following interface:
type GenericIndexable = {
[key: number]: unknown;
readonly length: number;
};
type Indexable = string | unknown[] | TypedArray | GenericIndexable;
interface Sliceable extends GenericIndexable {
slice(start: number, end?: number): this;
}
declare function diff_core(i: number, N: number, j: number, M: number, eq: (i: number, j: number) => boolean): Generator<Vec4>;
declare function diff<T extends Indexable>(xs: T, ys: T): Generator<[number, number, number, number]>;
declare function lcs<T extends Indexable>(xs: T, ys: T): Generator<[number, number, number]>;
declare function calcPatch<T extends Sliceable>(xs: T, ys: T): Generator<[number, number, T]>;
declare function applyPatch<T extends Sliceable>(xs: T, patch: Iterable<[number, number, T]>): Generator<T>;
diff_core(i, N, j, M, eq)
is the core of the library; given starting indices i
and j
, and slice-lengths N
and M
(i.e., the remaining length of the relevane sequence after the starting index), it produces a sequence of quadruples [sx, ex, sy, ey]
, where [sx, ex) indicates a range to delete from xs
and [sy, ey) indicates a range from ys
to replace the deleted material with. Simple deletions are indicated when sy === ey
and simple insertions when sx === ex
. Note that direct access to the sequences themselves is not required; instead, diff_core
, take a callback function eq
which is used to determine whether the relevant sequences are equal at given indices. Note that lacking access to the actual sequences being diffed ensures that the library cannot sacrifice efficiency by making temporary copies.
By writing your own eq
implementation, it is possible to compute diffs of sequences of types which are not normally comparable (e.g., arrays of objects where you wish to use value equality rather than reference equality), and even to get diffs of data structures which are not natively indexable. Despite the overhead of making a function call for comparisons, this diff implementation is still significantly faster than fast-diff
when the size of the diff is significant, as the speed offast-diff
's native string comparisons becomes less important.
diff(xs, ys)
is a wrapper around diff_core
which checks for common affixes (reducing the memory consumption and time spent in the core diff algorithm) and calculates i
, j
, N
, M
and eq
automatically.
lcs(xs, ys)
calls diff
internally, but pre-processes the output to produce triples of the form [sx, sy, l]
, where sx
and sy
are the starting idices in xs
and ys
respectively of an aligned common substring, and l
is the length of said substring. Indexing into the original input sequences can be used to retrieve the actual Longest Common Subsequence from this information, but the lcs
function itself does not attempt to take slices of the inputs.
calcPatch(xs, ys)
is a thin wrapper over diff
which replaces the [sy, ey) indices with the relevant slice of ys
. This can be used to reconstitute ys
given xs
. Once again, pure insertions are indicated when sx === ex
, but pure deletions are indicated by an empty slice--i.e., an empty string, a zero-length array, etc. The insert slices are of the same type as the original ys
. If ys
is a string or an array, they are produced with the slice
methods of strings or arrays, which will result in a shallow copy. If ys
is a typed array, slices will be produced with TypedArray.prototype.subarray
, which re-uses the existing underlying memory.
applyPatch(xs, patch)
takes the output of calcPatch(xs, ys)
and uses it to reconstitute the original elements of ys
. The output is not, however, a single reconstituted Indexable
, but a sequence of chunks taken alternately from xs
and from the patch
data. This is done for two reasons:
Indexable
type;diff_rec
, diff
and lcs
will also work with custom container types, as long as your container objects have a numeric length
property. calcPatch
and applyPatch
will work with custom types provided that they also implement a suitable slice(start[, end])
method.
The table below gives the number of operations per second reported by benchmark on a Windows 10 with Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.
input | fast-myers-diff | fast-diff-1.2.0 | myers-diff-2.0.1 | fast-myers-diff-2.0.0 |
---|---|---|---|---|
10, +100, -100 | 1,139 ops/sec | 2,724 ops/sec | 768 ops/sec | 1,115 ops/sec |
10, +4, -200 | 4,217 ops/sec | 9,094 ops/sec | 875 ops/sec | 4,119 ops/sec |
100, +10, -10 | 40,825 ops/sec | 14,531 ops/sec | 1,049 ops/sec | 42,327 ops/sec |
100, +20, -0 | 43,265 ops/sec | 18,649 ops/sec | 976 ops/sec | 44,582 ops/sec |
100, +0, -20 | 45,387 ops/sec | 15,867 ops/sec | 988 ops/sec | 48,545 ops/sec |
10, +1000, -1000 | 12.06 ops/sec | 32.86 ops/sec | 7.23 ops/sec | Not supported |
10000, +100, -100 | 587 ops/sec | 99.70 ops/sec | 0.23 ops/sec | Not supported |
10000, +200, -0 | 685 ops/sec | 95.26 ops/sec | 0.23 ops/sec | Not supported |
10000, +0, -200 | 705 ops/sec | 106 ops/sec | 0.24 ops/sec | Not supported |
10000, +10, -10 | 2,905 ops/sec | 64.11 ops/sec | 0.28 ops/sec | Not supported |
10000, +20, -0 | 3,378 ops/sec | 68.45 ops/sec | 0.26 ops/sec | Not supported |
10000, +0, -20 | 3,730 ops/sec | 59.50 ops/sec | 0.27 ops/sec | Not supported |
fast-myers-diff@2.0.0
used Uint8Array
to save indices, so it can only correctly handle inputs with added length less than 256.
fast-diff
is faster than fast-myers-diff
for inputs in which the longest common string is a small portion of the sequences. For differences of 20% fast-myers-diff
is about 6x faster, for differences of 2% about 50x faster.
FAQs
A fast, minimal, memory-efficient diff algorithm on strings, arrays, and typed arrays.
The npm package fast-myers-diff receives a total of 29,873 weekly downloads. As such, fast-myers-diff popularity was classified as popular.
We found that fast-myers-diff demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.