regenerate
Advanced tools
Comparing version 0.5.4 to 0.6.0
{ | ||
"name": "regenerate", | ||
"version": "0.5.4", | ||
"version": "0.6.0", | ||
"description": "Generate JavaScript-compatible regular expressions based on a given set of Unicode symbols or code points.", | ||
@@ -37,8 +37,7 @@ "homepage": "http://mths.be/regenerate", | ||
}, | ||
"dependencies": {}, | ||
"devDependencies": { | ||
"grunt": "~0.4.1", | ||
"grunt-shell": "~0.2.2", | ||
"grunt-shell": "~0.3.1", | ||
"istanbul": "~0.1.36", | ||
"qunit-clib": "~1.3.0", | ||
"qunit-extras": "~1.0.0", | ||
"qunitjs": "~1.11.0", | ||
@@ -45,0 +44,0 @@ "requirejs": "~2.1.6" |
310
README.md
@@ -1,4 +0,4 @@ | ||
# Regenerate [![Build status](https://travis-ci.org/mathiasbynens/regenerate.png?branch=master)](https://travis-ci.org/mathiasbynens/regenerate) [![Dependency status](https://gemnasium.com/mathiasbynens/regenerate.png)](https://gemnasium.com/mathiasbynens/regenerate) | ||
# Regenerate [![Build status](https://travis-ci.org/mathiasbynens/regenerate.svg?branch=master)](https://travis-ci.org/mathiasbynens/regenerate) [![Dependency status](https://gemnasium.com/mathiasbynens/regenerate.svg)](https://gemnasium.com/mathiasbynens/regenerate) | ||
_Regenerate_ is a Unicode-aware regex generator for JavaScript. It allows you to easily generate JavaScript-compatible regular expressions based on a given set of Unicode symbols or code points. | ||
_Regenerate_ is a Unicode-aware regex generator for JavaScript. It allows you to easily generate JavaScript-compatible regular expressions based on a given set of Unicode symbols or code points. (This is trickier than you might think, because of [how JavaScript deals with astral symbols](http://mathiasbynens.be/notes/javascript-unicode).) | ||
@@ -9,3 +9,3 @@ Feel free to fork if you see possible improvements! | ||
Via [npm](http://npmjs.org/): | ||
Via [npm](https://npmjs.org/): | ||
@@ -78,12 +78,12 @@ ```bash | ||
.remove(0x62, 0x64) // remove U+0062 and U+0064 | ||
.add(0x1D306) // add U+1D306 | ||
.add(0x1D306); // add U+1D306 | ||
set.valueOf(); | ||
// → [0x60, 0x61, 0x63, 0x65, 0x66, 0x67, 0x68, 0x69, 0x1D306] | ||
set.toString(); | ||
// → '[\\x60-ace-i]|\\uD834\\uDF06' | ||
// → '[`ace-i]|\\uD834\\uDF06' | ||
set.toRegExp(); | ||
// → /[\x60-ace-i]|\uD834\uDF06/ | ||
// → /[`ace-i]|\uD834\uDF06/ | ||
``` | ||
Any arguments passed to `regenerate()` will be added to the set right away. Both code points (numbers) as symbols (strings consisting of a single Unicode symbol) are accepted. | ||
Any arguments passed to `regenerate()` will be added to the set right away. Both code points (numbers) as symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types. | ||
@@ -93,2 +93,6 @@ ```js | ||
// → '[A\\xA9\\u2603]|\\uD834\\uDF06' | ||
var items = [0x1D306, 'A', '©', 0x2603]; | ||
regenerate(items).toString(); | ||
// → '[A\\xA9\\u2603]|\\uD834\\uDF06' | ||
``` | ||
@@ -103,4 +107,23 @@ | ||
// → '[A\\xA9\\u2603]|\\uD834\\uDF06' | ||
var items = [0x1D306, 'A', '©', 0x2603]; | ||
regenerate().add(items).toString(); | ||
// → '[A\\xA9\\u2603]|\\uD834\\uDF06' | ||
``` | ||
It’s also possible to pass in a Regenerate instance. Doing so adds all code points in that instance to the current set. | ||
```js | ||
var set = regenerate(0x1D306, 'A'); | ||
regenerate().add('©', 0x2603).add(set).toString(); | ||
// → '[A\\xA9\\u2603]|\\uD834\\uDF06' | ||
``` | ||
Note that the initial call to `regenerate()` acts like `add()`. This allows you to create a new Regenerate instance and add some code points to it in one go: | ||
```js | ||
regenerate(0x1D306, 'A', '©', 0x2603).toString(); | ||
// → '[A\\xA9\\u2603]|\\uD834\\uDF06' | ||
``` | ||
### `regenerate.prototype.remove(value1, value2, value3, ...)` | ||
@@ -115,9 +138,8 @@ | ||
Functions can also be passed. In that case, the result of calling the function against a code point value in the set determines whether the element should be removed (`true`) or not (`false`). | ||
It’s also possible to pass in a Regenerate instance. Doing so removes all code points in that instance from the current set. | ||
```js | ||
regenerate(0x1D306, 'A', '©', 0x2603).remove(function(codePoint) { | ||
return codePoint > 0xFFFF; // remove astral code points from the set | ||
}).toString(); | ||
// → '[A\\xA9\\u2603]' | ||
var set = regenerate('☃'); | ||
regenerate(0x1D306, 'A', '©', 0x2603).remove(set).toString(); | ||
// → '[A\\xA9]|\\uD834\\uDF06' | ||
``` | ||
@@ -146,3 +168,3 @@ | ||
.toString(); | ||
// → '[\\0-\\x40\\x7B-\\uD7FF\\uDC00-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF]' | ||
// → '[\\0-@\\{-\\uD7FF\\uDC00-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF]' | ||
@@ -153,3 +175,3 @@ regenerate() | ||
.toString(); | ||
// → '[\\0-\\x40\\x7B-\\uD7FF\\uDC00-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF]' | ||
// → '[\\0-@\\{-\\uD7FF\\uDC00-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF]' | ||
``` | ||
@@ -166,5 +188,16 @@ | ||
.toString(); | ||
// → '[\0-\x60b-rt-\xFF]' | ||
// → '[\\0-`b-rt-\\xFF]' | ||
``` | ||
Instead of the `codePoints` array, it’s also possible to pass in a Regenerate instance. | ||
```js | ||
var blacklist = regenerate().add(0x61, 0x73); | ||
var setB = regenerate() | ||
.addRange(0x00, 0xFF) // add extended ASCII code points | ||
.difference(blacklist) // remove the code points in the `blacklist` set from this set | ||
.toString(); | ||
// → '[\\0-`b-rt-\\xFF]' | ||
``` | ||
### `regenerate.prototype.intersection(codePoints)` | ||
@@ -182,2 +215,14 @@ | ||
Instead of the `codePoints` array, it’s also possible to pass in a Regenerate instance. | ||
```js | ||
var whitelist = regenerate(0x61, 0x69); | ||
regenerate() | ||
.addRange(0x00, 0xFF) // add extended ASCII code points | ||
.intersection(whitelist) // remove all code points from the set except for those in the `whitelist` set | ||
.toString(); | ||
// → '[ai]' | ||
``` | ||
### `regenerate.prototype.contains(value)` | ||
@@ -195,2 +240,15 @@ | ||
### `regenerate.prototype.clone()` | ||
Returns a clone of the current code point set. Any actions performed on the clone won’t mutate the original set. | ||
```js | ||
var setA = regenerate(0x1D306); | ||
var setB = setA.clone().add(0x1F4A9); | ||
setA.toArray(); | ||
// → [0x1D306] | ||
setB.toArray(); | ||
// → [0x1D306, 0x1F4A9] | ||
``` | ||
### `regenerate.prototype.toString()` | ||
@@ -236,221 +294,31 @@ | ||
### `regenerate.fromCodePoints(codePoints)` | ||
## Combine Regenerate with other libraries | ||
This function takes an array of numerical code point values and returns a string representing (part of) a regular expression that would match all the symbols mapped to those code points. | ||
Regenerate gets even better when combined with other libraries such as [Punycode.js](http://mths.be/punycode). Here’s an example where [Punycode.js](http://mths.be/punycode) is used to convert a string into an array of code points, that is then passed on to Regenerate: | ||
```js | ||
// Create a regular expression that matches any of the given code points: | ||
regenerate.fromCodePoints([0x1F604, 0x1F605, 0x1F606, 0x1F607]); | ||
// → '\\uD83D[\\uDE04-\\uDE07]' | ||
``` | ||
var regenerate = require('regenerate'); | ||
var punycode = require('punycode'); | ||
### `regenerate.fromCodePointRange(start, end)` | ||
var string = 'Lorem ipsum dolor sit amet.'; | ||
// Get an array of all code points used in the string: | ||
var codePoints = punycode.ucs2.decode(string); | ||
This function takes a `start` and an `end` code point value, and returns a string representing (part of) a regular expression that would match all the symbols mapped to the code points within the range _[start, end]_ (inclusive). | ||
```js | ||
// Create a regular expression that matches any code point in the given range: | ||
regenerate.fromCodePointRange(0x1F604, 0x1F607); | ||
// → '\\uD83D[\\uDE04-\\uDE07]' | ||
// Create a regular expression that matches any Unicode code point: | ||
regenerate.fromCodePointRange(0x000000, 0x10FFFF); | ||
// → '[\\0-\\uD7FF\\uDC00-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF]' | ||
// Generate a regular expression that matches any of the symbols used in the string: | ||
regenerate(codePoints).toString(); | ||
// → '[ \\.Ladeilmopr-u]' | ||
``` | ||
### `regenerate.fromCodePointRanges(ranges)` | ||
In ES6 you can do something similar with [`Array.from`](http://mths.be/array-from) which uses [the string’s iterator](http://mathiasbynens.be/notes/javascript-unicode#iterating-over-symbols) to split the given string into an array of strings that each contain a single symbol. [`regenerate()`](#regenerateprototypeaddvalue1-value2-value3-) accepts both strings and code points, remember? | ||
This function takes an array of code point ranges or separate code points, and returns a string representing (part of) a regular expression that would match all the symbols mapped to the code points within the listed code points or code point ranges. | ||
```js | ||
// Create a regular expression based on a dynamically created range of code points: | ||
regenerate.fromCodePointRanges([ | ||
[0x00, 0xFF], // range | ||
[0x2603, 0x2608], // range | ||
0x1F4A9, // separate code point | ||
0x1F4BB // separate code point | ||
]); | ||
// → '[\\0-\\xFF\\u2603-\\u2608]|\\uD83D[\\uDCA9\\uDCBB]' | ||
``` | ||
```js | ||
// Allow all Unicode symbols except U+2603 SNOWMAN and U+1F4A9 PILE OF POO: | ||
regenerate.fromCodePointRanges([ | ||
[0x0000, 0x2602], // skip 0x2603 | ||
[0x2604, 0x1F4A8], // skip 0x1F4A9 | ||
[0x1F4AA, 0x10FFFF] | ||
]); | ||
// → '[\\0-\\u2602\\u2604-\\uD7FF\\uDC00-\\uFFFF]|[\\uD800-\\uD83C\\uD83E-\\uDBFF][\\uDC00-\\uDFFF]|\\uD83D[\\uDC00-\\uDCA8\\uDCAA-\\uDFFF]|[\\uD800-\\uDBFF]' | ||
``` | ||
### `regenerate.fromSymbols(symbols)` | ||
This function takes an array of strings that each contain a single Unicode symbol. It returns a string representing (part of) a regular expression that would match all those symbols. | ||
```js | ||
// Create a regular expression that matches any of the given Unicode symbols: | ||
regenerate.fromSymbols(['𝐀', '𝐁', '𝐂', '𝐃', '𝐄']); | ||
// → '\\uD835[\\uDC00-\\uDC04]' | ||
``` | ||
### `regenerate.fromSymbolRange(start, end)` | ||
This function takes a `start` and an `end` string which each contain a single Unicode symbol. It returns a string representing (part of) a regular expression that would match all the symbols within the range _[start, end]_ (inclusive). | ||
```js | ||
// Create a regular expression that matches any Unicode symbol in the given range: | ||
regenerate.fromSymbolRange('𝐏', '𝐟'); | ||
// → '\\uD835[\\uDC0F-\\uDC1F]' | ||
``` | ||
### `regenerate.fromSymbolRanges(ranges)` | ||
This function takes an array of symbol ranges or separate strings, each containing a single Unicode symbol, and returns a string representing (part of) a regular expression that would match all the symbols within the listed symbols or symbol ranges. | ||
```js | ||
// Create a regular expression based on a dynamically created range of code points: | ||
regenerate.fromSymbolRanges([ | ||
['\0', '\xFF'], // range | ||
['\u2603', '\u2608'], // range | ||
'\uD83D\uDCA9', // separate symbol | ||
'\uD83D\uDCBB' // separate symbol | ||
]); | ||
// → '[\\0-\\xFF\\u2603-\\u2608]|\\uD83D[\\uDCA9\\uDCBB]' | ||
``` | ||
### `regenerate.range(start, end)` | ||
This function takes a `start` and an `end` number and returns an array of numbers progressing from `start` up to and including `end`, i.e. all the numbers within the range _[start, end]_ (inclusive). | ||
```js | ||
// Create an array containing all extended ASCII code points: | ||
regenerate.range(0x00, 0xFF); | ||
// → [0x00, 0x01, 0x02, 0x03, ..., 0xFF] | ||
``` | ||
### `regenerate.ranges(ranges)` | ||
This function takes an array of code point ranges or separate code points, and returns an array containing all the code points within the listed code points or code point ranges. | ||
```js | ||
// Create a regular expression based on a dynamically created range of code points: | ||
var codePoints = regenerate.ranges([ | ||
[0x00, 0xFF], // → 0x00, 0x01, 0x02, 0x03, …, 0xFC, 0xFD, 0xFE, 0xFF | ||
[0x2603, 0x2608], // → 0x2603, 0x2604, 0x2605, 0x2606, 0x2607, 0x2608 | ||
0x1F4A9, // add U+1F4A9 PILE OF POO | ||
0x1F4BB // add U+1F4BB PERSONAL COMPUTER | ||
]); | ||
// → [0x00, 0x01, …, 0xFE, 0xFF, 0x2603, 0x2604, …, 0x2607, 0x2608, 0x1F4A9, 0x1F4BB] | ||
regenerate.fromCodePoints(codePoints); | ||
// → '[\\0-\\xFF\\u2603-\\u2608]|\\uD83D[\\uDCA9\\uDCBB]' | ||
``` | ||
### `regenerate.contains(array, value)` | ||
Returns `true` if `array` contains `value`, and `false` otherwise. | ||
```js | ||
var ASCII = regenerate.range(0x00, 0xFF); // extended ASCII | ||
// → [0x00, 0x01, 0x02, 0x03, ..., 0xFF] | ||
regenerate.contains(ASCII, 0x61); | ||
// → true | ||
regenerate.contains(ASCII, 0x1D306); | ||
// → false | ||
``` | ||
### `regenerate.difference(array1, array2)` | ||
Returns an array of `array1` elements that are not present in `array2`. | ||
```js | ||
regenerate.difference( | ||
[0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06], | ||
[0x01, 0x03, 0x05] | ||
); | ||
// → [0x00, 0x02, 0x04, 0x06] | ||
``` | ||
### `regenerate.intersection(array1, array2)` | ||
Returns an array of unique elements that are present in both `array1` and `array2`. | ||
```js | ||
regenerate.intersection( | ||
[0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06], | ||
[0x01, 0x03, 0x05, 0x07] | ||
); | ||
// → [0x01, 0x03, 0x05] | ||
``` | ||
### `regenerate.add(array, value)` | ||
Extends `array` based on `value` as follows: | ||
* If `value` is a code point (i.e. a number), it’s appended to `array`. | ||
* If `value` is a symbol (i.e. a string containing a single Unicode symbol), its numeric code point value is appended to `array`. | ||
* If `value` is an array, all its values are added to `array` following the above steps. | ||
```js | ||
regenerate.add( | ||
[0x00, 0x1D306], | ||
0x41 | ||
); | ||
// → [0x00, 0x1D306, 0x41] | ||
regenerate.add( | ||
[0x00, 0x1D306], | ||
'A' | ||
); | ||
// → [0x00, 0x1D306, 0x41] | ||
regenerate.add( | ||
[0x00, 0x1D306], | ||
[0x61, 0x203B, 'A'] | ||
); | ||
// → [0x00, 0x1D306, 0x61, 0x203B, 0x41] | ||
``` | ||
### `regenerate.remove(array, value)` | ||
Removes values from `array` based on `value` as follows: | ||
* If `value` is a code point (i.e. a number), it’s removed from `array`. | ||
* If `value` is a symbol (i.e. a string containing a single Unicode symbol), its numeric code point value is removed from `array`. | ||
* If `value` is an array, all its values are removed from `array` following on the above steps. | ||
```js | ||
regenerate.remove( | ||
[0x00, 0x1D306, 0x41], | ||
0x41 | ||
); | ||
// → [0x00, 0x1D306] | ||
regenerate.remove( | ||
[0x00, 0x1D306, 0x41], | ||
'A' | ||
); | ||
// → [0x00, 0x1D306] | ||
regenerate.remove( | ||
[0x00, 0x1D306, 0x61, 0x203B, 0x41], | ||
[0x61, 0x203B, 'A'] | ||
); | ||
// → [0x00, 0x1D306] | ||
``` | ||
## Combine Regenerate with other libraries | ||
Regenerate gets even better when combined with other libraries such as [Punycode.js](http://mths.be/punycode). Here’s an example where [Punycode.js](http://mths.be/punycode) is used to convert a string into an array of code points, that is then passed on to Regenerate: | ||
```js | ||
var regenerate = require('regenerate'); | ||
var punycode = require('punycode'); | ||
var string = 'Lorem ipsum dolor sit amet.'; | ||
// Get an array of all code points used in the string: | ||
var codePoints = punycode.ucs2.decode(string); | ||
// Get an array of all symbols used in the string: | ||
var codePoints = Array.from(string); | ||
// Generate a regular expression that matches any of the symbols used in the string: | ||
regenerate(codePoints).toString(); | ||
// → '[\\x20\\x2ELad-eil-mo-pr-u]' | ||
// → '[ \\.Ladeilmopr-u]' | ||
``` | ||
@@ -460,3 +328,3 @@ | ||
Regenerate has been tested in at least Chrome 27-29, Firefox 3-22, Safari 4-6, Opera 10-12, IE 6-10, Node.js v0.10.0, Narwhal 0.3.2, RingoJS 0.8-0.9, PhantomJS 1.9.0, and Rhino 1.7RC4. | ||
Regenerate supports at least Chrome 27+, Firefox 3+, Safari 4+, Opera 10+, IE 6+, Node.js v0.10.0+, Narwhal 0.3.2+, RingoJS 0.8+, PhantomJS 1.9.0+, and Rhino 1.7RC4+. | ||
@@ -473,3 +341,3 @@ ## Unit tests & code coverage | ||
| [![twitter/mathias](http://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](http://twitter.com/mathias "Follow @mathias on Twitter") | | ||
| [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") | | ||
|---| | ||
@@ -476,0 +344,0 @@ | [Mathias Bynens](http://mathiasbynens.be/) | |
1289
regenerate.js
@@ -1,8 +0,8 @@ | ||
/*! http://mths.be/regenerate v0.5.4 by @mathias | MIT license */ | ||
/*! http://mths.be/regenerate v0.6.0 by @mathias | MIT license */ | ||
;(function(root) { | ||
// Detect free variables `exports` | ||
// Detect free variables `exports`. | ||
var freeExports = typeof exports == 'object' && exports; | ||
// Detect free variable `module` | ||
// Detect free variable `module`. | ||
var freeModule = typeof module == 'object' && module && | ||
@@ -12,3 +12,3 @@ module.exports == freeExports && module; | ||
// Detect free variable `global`, from Node.js or Browserified code, | ||
// and use it as `root` | ||
// and use it as `root`. | ||
var freeGlobal = typeof global == 'object' && global; | ||
@@ -21,8 +21,21 @@ if (freeGlobal.global === freeGlobal || freeGlobal.window === freeGlobal) { | ||
var ERRORS = { | ||
'rangeOrder': 'A range\u2019s `stop` value must be greater than or equal ' + | ||
'to the `start` value.', | ||
'codePointRange': 'Invalid code point value. Code points range from ' + | ||
'U+000000 to U+10FFFF.' | ||
}; | ||
// http://mathiasbynens.be/notes/javascript-encoding#surrogate-pairs | ||
var HIGH_SURROGATE_MIN = 0xD800; | ||
var HIGH_SURROGATE_MAX = 0xDBFF; | ||
var LOW_SURROGATE_MIN = 0xDC00; | ||
var LOW_SURROGATE_MAX = 0xDFFF; | ||
// In Regenerate output, `\0` will never be preceded by `\` because we sort | ||
// by code point value, so let’s keep this regular expression simple. | ||
var regexNull = /\\x00([^0123456789]|$)/g; | ||
var object = {}; | ||
var hasOwnProperty = object.hasOwnProperty; | ||
var hasKey = function(object, key) { | ||
return hasOwnProperty.call(object, key); | ||
}; | ||
var extend = function(destination, source) { | ||
@@ -38,27 +51,2 @@ var key; | ||
var toString = object.toString; | ||
var isArray = function(value) { | ||
return toString.call(value) == '[object Array]'; | ||
}; | ||
var isNumber = function(value) { | ||
return typeof value == 'number' || | ||
toString.call(value) == '[object Number]'; | ||
}; | ||
var isString = function(value) { | ||
return typeof value == 'string' || | ||
toString.call(value) == '[object String]'; | ||
}; | ||
var isFunction = function(value) { | ||
return typeof value == 'function'; | ||
}; | ||
var map = function(array, callback) { | ||
var index = -1; | ||
var length = array.length; | ||
while (++index < length) { | ||
array[index] = callback(array[index]); | ||
} | ||
return array; | ||
}; | ||
var forEach = function(array, callback) { | ||
@@ -72,36 +60,11 @@ var index = -1; | ||
var forOwn = function(object, callback) { | ||
var key; | ||
for (key in object) { | ||
if (hasKey(object, key)) { | ||
callback(key, object[key]); | ||
} | ||
} | ||
var toString = object.toString; | ||
var isArray = function(value) { | ||
return toString.call(value) == '[object Array]'; | ||
}; | ||
var append = function(object, key, value) { | ||
if (hasKey(object, key)) { | ||
object[key].push(value); | ||
} else { | ||
object[key] = [value]; | ||
} | ||
var isNumber = function(value) { | ||
return typeof value == 'number' || | ||
toString.call(value) == '[object Number]'; | ||
}; | ||
var sortUniqueNumbers = function(array) { | ||
// Sort numerically in ascending order | ||
array = array.sort(function(a, b) { | ||
return a - b; | ||
}); | ||
// Remove duplicates | ||
var previous; | ||
var result = []; | ||
forEach(array, function(item, index) { | ||
if (previous != item) { | ||
result.push(item); | ||
previous = item; | ||
} | ||
}); | ||
return result; | ||
}; | ||
// This assumes that `number` is a positive integer that `toString()`s nicely | ||
@@ -125,123 +88,408 @@ // (which is the case for all code point values). | ||
var range = function(start, stop) { | ||
// inclusive, e.g. `range(1, 3)` → `[1, 2, 3]` | ||
if (stop < start) { | ||
throw Error('A range\u2019s `stop` value must be greater than or equal ' + | ||
'to the `start` value.'); | ||
var dataFromCodePoints = function(codePoints) { | ||
var index = -1; | ||
var length = codePoints.length; | ||
var max = length - 1; | ||
var result = []; | ||
var isStart = true; | ||
var tmp; | ||
var previous = 0; | ||
while (++index < length) { | ||
tmp = codePoints[index]; | ||
if (isStart) { | ||
result.push(tmp); | ||
previous = tmp; | ||
isStart = false; | ||
} else { | ||
if (tmp == previous + 1) { | ||
if (index != max) { | ||
previous = tmp; | ||
continue; | ||
} else { | ||
isStart = true; | ||
result.push(tmp + 1); | ||
} | ||
} else { | ||
// End the previous range and start a new one. | ||
result.push(previous + 1, tmp); | ||
previous = tmp; | ||
} | ||
} | ||
} | ||
for (var result = []; start <= stop; result.push(start++)); | ||
if (!isStart) { | ||
result.push(tmp + 1); | ||
} | ||
return result; | ||
}; | ||
var ranges = function(codePointRanges) { | ||
if (!isArray(codePointRanges)) { | ||
throw TypeError('ranges(): The `codePointRanges` argument must be an ' + | ||
'array.'); | ||
var dataRemove = function(data, codePoint) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var length = data.length; | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1]; | ||
if (codePoint >= start && codePoint < end) { | ||
// Modify this pair. | ||
if (codePoint == start) { | ||
if (end == start + 1) { | ||
// Just remove `start` and `end`. | ||
data.splice(index, 2); | ||
return data; | ||
} else { | ||
// Just replace `start` with a new value. | ||
data[index] = codePoint + 1; | ||
return data; | ||
} | ||
} else if (codePoint == end - 1) { | ||
// Just replace `end` with a new value. | ||
data[index + 1] = codePoint; | ||
return data; | ||
} else { | ||
// Replace `[start, end]` with `[startA, endA, startB, endB]`. | ||
data.splice(index, 2, start, codePoint, codePoint + 1, end); | ||
return data; | ||
} | ||
} | ||
index += 2; | ||
} | ||
return data; | ||
}; | ||
if (!codePointRanges.length) { | ||
return []; | ||
var dataRemoveRange = function(data, rangeStart, rangeEnd) { | ||
if (rangeEnd < rangeStart) { | ||
throw Error(ERRORS.rangeOrder); | ||
} | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
while (index < data.length) { | ||
start = data[index]; | ||
end = data[index + 1]; | ||
var codePoints = []; | ||
forEach(codePointRanges, function(codePointRange) { | ||
// If it’s a single code point (not a range) | ||
if (!isArray(codePointRange)) { | ||
codePoints.push(codePointRange); | ||
return; | ||
// Exit as soon as no more matching pairs can be found. | ||
if (start > rangeEnd) { | ||
return data; | ||
} | ||
// If it’s a range (not a single code point) | ||
var start = codePointRange[0]; | ||
var stop = codePointRange[1]; | ||
codePoints = codePoints.concat(range(start, stop)); | ||
}); | ||
return codePoints; | ||
// Check if this range pair is equal to, or forms a subset of, the range | ||
// to be removed. | ||
// E.g. we have `[0, 11, 40, 51]` and want to remove 0-10 → `[40, 51]`. | ||
// E.g. we have `[40, 51]` and want to remove 0-100 → `[]`. | ||
if (rangeStart <= start && rangeEnd + 1 >= end) { | ||
// Remove this pair. | ||
data.splice(index, 2); | ||
continue; | ||
} | ||
// Check if both `rangeStart` and `rangeEnd` are within the bounds of | ||
// this pair. | ||
// E.g. we have `[0, 11]` and want to remove 4-6 → `[0, 4, 7, 11]`. | ||
if (rangeStart >= start && rangeEnd < end) { | ||
// Replace `[start, end]` with `[startA, endA, startB, endB]`. | ||
data.splice(index, 2, start, rangeStart, rangeEnd + 1, end); | ||
return data; | ||
} | ||
// Check if only `rangeStart` is within the bounds of this pair. | ||
// E.g. we have `[0, 11]` and want to remove 4-20 → `[0, 4]`. | ||
if (rangeStart >= start && rangeStart < end) { | ||
// Replace `end` with `rangeStart`. | ||
data[index + 1] = rangeStart; | ||
// Note: we cannot `return` just yet, in case any following pairs still | ||
// contain matching code points. | ||
// E.g. we have `[0, 11, 14, 31]` and want to remove 4-20 | ||
// → `[0, 4, 21, 31]`. | ||
} | ||
// Check if only `rangeEnd` is within the bounds of this pair. | ||
// E.g. we have `[14, 31]` and want to remove 4-20 → `[21, 31]`. | ||
else if (rangeEnd >= start && rangeEnd < end) { | ||
// Just replace `start`. | ||
data[index] = rangeEnd + 1; | ||
return data; | ||
} | ||
index += 2; | ||
} | ||
return data; | ||
}; | ||
var contains = function(array, value) { | ||
var index = -1; | ||
var length = array.length; | ||
while (++index < length) { | ||
if (array[index] == value) { | ||
return true; | ||
var dataAdd = function(data, codePoint) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var lastIndex = null; | ||
var length = data.length; | ||
if (codePoint < 0x0 || codePoint > 0x10FFFF) { | ||
throw RangeError(ERRORS.codePointRange); | ||
} | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1]; | ||
// Check if the code point is already in the set. | ||
if (codePoint >= start && codePoint < end) { | ||
return data; | ||
} | ||
if (codePoint == start - 1) { | ||
// Just replace `start` with a new value. | ||
data[index] = codePoint; | ||
return data; | ||
} | ||
// At this point, if `start` is `greater` than `codePoint`, insert a new | ||
// `[start, end]` pair before the current pair, or after the current pair | ||
// if there is a known `lastIndex`. | ||
if (start > codePoint) { | ||
data.splice( | ||
lastIndex != null ? lastIndex + 2 : 0, | ||
0, | ||
codePoint, | ||
codePoint + 1 | ||
); | ||
return data; | ||
} | ||
if (codePoint == end) { | ||
// Check if adding this code point causes two separate ranges to become | ||
// a single range, e.g. `dataAdd([0, 4, 5, 10], 4)` → `[0, 10]`. | ||
if (codePoint + 1 == data[index + 2]) { | ||
data.splice(index, 4, start, data[index + 3]); | ||
return data; | ||
} | ||
// Else, just replace `end` with a new value. | ||
data[index + 1] = codePoint + 1; | ||
return data; | ||
} | ||
lastIndex = index; | ||
index += 2; | ||
} | ||
return false; | ||
// The loop has finished; add the new pair to the end of the data set. | ||
data.push(codePoint, codePoint + 1); | ||
return data; | ||
}; | ||
var difference = function(a, b) { | ||
var index = -1; | ||
var length = a.length; | ||
var result = []; | ||
var value; | ||
while (++index < length) { | ||
value = a[index]; | ||
if (!contains(b, value)) { | ||
result.push(value); | ||
var dataAddData = function(dataA, dataB) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var data = dataA.slice(); | ||
var length = dataB.length; | ||
while (index < length) { | ||
start = dataB[index]; | ||
end = dataB[index + 1] - 1; | ||
if (start == end) { | ||
data = dataAdd(data, start); | ||
} else { | ||
data = dataAddRange(data, start, end); | ||
} | ||
index += 2; | ||
} | ||
return result; | ||
return data; | ||
}; | ||
var intersection = function(a, b) { | ||
var index = -1; | ||
var length = a.length; | ||
var result = []; | ||
var value; | ||
while (++index < length) { | ||
value = a[index]; | ||
if (contains(b, value)) { | ||
result.push(value); | ||
var dataRemoveData = function(dataA, dataB) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var data = dataA.slice(); | ||
var length = dataB.length; | ||
while (index < length) { | ||
start = dataB[index]; | ||
end = dataB[index + 1] - 1; | ||
if (start == end) { | ||
data = dataRemove(data, start); | ||
} else { | ||
data = dataRemoveRange(data, start, end); | ||
} | ||
index += 2; | ||
} | ||
return result; | ||
return data; | ||
}; | ||
var add = function(destination, value) { | ||
if (!isArray(destination)) { | ||
throw TypeError('add(): The `destination` argument must be an array.'); | ||
var dataAddRange = function(data, rangeStart, rangeEnd) { | ||
if (rangeEnd < rangeStart) { | ||
throw Error(ERRORS.rangeOrder); | ||
} | ||
if (isNumber(value)) { | ||
destination.push(Number(value)); | ||
return destination; | ||
if ( | ||
rangeStart < 0x0 || rangeStart > 0x10FFFF || | ||
rangeEnd < 0x0 || rangeEnd > 0x10FFFF | ||
) { | ||
throw RangeError(ERRORS.codePointRange); | ||
} | ||
if (isString(value)) { | ||
destination.push(symbolToCodePoint(value)); | ||
return destination; | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var added = false; | ||
var length = data.length; | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1]; | ||
if (added) { | ||
// The range has already been added to the set; at this point, we just | ||
// need to get rid of the following ranges in case they overlap. | ||
// Check if this range can be combined with the previous range. | ||
if (start == rangeEnd + 1) { | ||
data.splice(index - 1, 2); | ||
return data; | ||
} | ||
// Exit as soon as no more possibly overlapping pairs can be found. | ||
if (start > rangeEnd) { | ||
return data; | ||
} | ||
// E.g. `[0, 11, 12, 16]` and we’ve added 5-15, so we now have | ||
// `[0, 16, 12, 16]`. Remove the `12,16` part, as it lies within the | ||
// `0,16` range that was previously added. | ||
if (start >= rangeStart && start <= rangeEnd) { | ||
// `start` lies within the range that was previously added. | ||
if (end > rangeStart && end - 1 <= rangeEnd) { | ||
// `end` lies within the range that was previously added as well, | ||
// so remove this pair. | ||
data.splice(index, 2); | ||
index -= 2; | ||
// Note: we cannot `return` just yet, as there may still be other | ||
// overlapping pairs. | ||
} else { | ||
// `start` lies within the range that was previously added, but | ||
// `end` doesn’t. E.g. `[0, 11, 12, 31]` and we’ve added 5-15, so | ||
// now we have `[0, 16, 12, 31]`. This must be written as `[0, 31]`. | ||
// Remove the previously added `end` and the current `start`. | ||
data.splice(index - 1, 2); | ||
index -= 2; | ||
} | ||
// Note: we cannot return yet. | ||
} | ||
} | ||
else if (start == rangeEnd + 1) { | ||
data[index] = rangeStart; | ||
return data; | ||
} | ||
// Check if a new pair must be inserted *before* the current one. | ||
else if (start > rangeEnd) { | ||
data.splice(index, 0, rangeStart, rangeEnd + 1); | ||
return data; | ||
} | ||
else if (rangeStart >= start && rangeStart < end && rangeEnd + 1 <= end) { | ||
// The new range lies entirely within an existing range pair. No action | ||
// needed. | ||
return data; | ||
} | ||
else if ( | ||
// E.g. `[0, 11]` and you add 5-15 → `[0, 16]`. | ||
(rangeStart >= start && rangeStart < end) || | ||
// E.g. `[0, 3]` and you add 3-6 → `[0, 7]`. | ||
end == rangeStart | ||
) { | ||
// Replace `end` with the new value. | ||
data[index + 1] = rangeEnd + 1; | ||
// Make sure the next range pair doesn’t overlap, e.g. `[0, 11, 12, 14]` | ||
// and you add 5-15 → `[0, 16]`, i.e. remove the `12,14` part. | ||
added = true; | ||
// Note: we cannot `return` just yet. | ||
} | ||
index += 2; | ||
} | ||
if (isArray(value)) { | ||
forEach(value, function(item) { | ||
destination = add(destination, item); | ||
}); | ||
return destination; | ||
// The loop has finished without doing anything; add the new pair to the end | ||
// of the data set. | ||
if (!added) { | ||
data.push(rangeStart, rangeEnd + 1); | ||
} | ||
return destination; | ||
return data; | ||
}; | ||
var remove = function(destination, value) { | ||
if (!isArray(destination)) { | ||
throw TypeError('remove(): The `destination` argument must be an array.'); | ||
var dataContains = function(data, codePoint) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var length = data.length; | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1]; | ||
if (codePoint >= start && codePoint < end) { | ||
return true; | ||
} | ||
index += 2; | ||
} | ||
if (isFunction(value)) { | ||
var array = []; | ||
forEach(destination, function(item) { | ||
if (!value(item)) { | ||
array.push(item); | ||
} | ||
}); | ||
return array; | ||
return false; | ||
}; | ||
var dataIntersection = function(data, codePoints) { | ||
var index = 0; | ||
var length = codePoints.length; | ||
var codePoint; | ||
var result = []; | ||
while (index < length) { | ||
codePoint = codePoints[index]; | ||
if (dataContains(data, codePoint)) { | ||
result.push(codePoint); | ||
} | ||
++index; | ||
} | ||
if (isNumber(value)) { | ||
return difference(destination, [value]); | ||
return dataFromCodePoints(result); | ||
}; | ||
var dataDifference = function(data, codePoints) { | ||
var index = 0; | ||
var length = codePoints.length; | ||
var codePoint; | ||
// Create a clone to avoid mutating the original `data`. | ||
var newData = data.slice(0); | ||
while (index < length) { | ||
codePoint = codePoints[index]; | ||
if (dataContains(newData, codePoint)) { | ||
newData = dataRemove(newData, codePoint); | ||
} | ||
++index; | ||
} | ||
if (isString(value)) { | ||
return difference(destination, [symbolToCodePoint(value)]); | ||
return newData; | ||
}; | ||
var dataIsEmpty = function(data) { | ||
return !data.length; | ||
}; | ||
var dataIsSingleton = function(data) { | ||
// Check if the set only represents a single code point. | ||
return data.length == 2 && data[0] + 1 == data[1]; | ||
}; | ||
var dataToArray = function(data) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var result = []; | ||
var length = data.length; | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1]; | ||
while (start < end) { | ||
result.push(start); | ||
++start; | ||
} | ||
index += 2; | ||
} | ||
if (isArray(value)) { | ||
forEach(value, function(item) { | ||
destination = remove(destination, item); | ||
}); | ||
return destination; | ||
} | ||
return destination; | ||
return result; | ||
}; | ||
@@ -254,7 +502,13 @@ | ||
var highSurrogate = function(codePoint) { | ||
return parseInt(floor((codePoint - 0x10000) / 0x400) + 0xD800, 10); | ||
return parseInt( | ||
floor((codePoint - 0x10000) / 0x400) + HIGH_SURROGATE_MIN, | ||
10 | ||
); | ||
}; | ||
var lowSurrogate = function(codePoint) { | ||
return parseInt((codePoint - 0x10000) % 0x400 + 0xDC00, 10); | ||
return parseInt( | ||
(codePoint - 0x10000) % 0x400 + LOW_SURROGATE_MIN, | ||
10 | ||
); | ||
}; | ||
@@ -265,13 +519,55 @@ | ||
var string; | ||
if ( | ||
(codePoint >= 0x41 && codePoint <= 0x5A) || | ||
(codePoint >= 0x61 && codePoint <= 0x7A) || | ||
(codePoint >= 0x30 && codePoint <= 0x39) | ||
// http://mathiasbynens.be/notes/javascript-escapes#single | ||
if (codePoint == 0x08) { | ||
string = '\\b'; | ||
} | ||
else if (codePoint == 0x09) { | ||
string = '\\t'; | ||
} | ||
// Note: IE < 9 treats `'\v'` as `'v'`, so avoid using it. | ||
// else if (codePoint == 0x0B) { | ||
// string = '\\v'; | ||
// } | ||
else if (codePoint == 0x0A) { | ||
string = '\\n'; | ||
} | ||
else if (codePoint == 0x0C) { | ||
string = '\\f'; | ||
} | ||
else if (codePoint == 0x0D) { | ||
string = '\\r'; | ||
} | ||
else if (codePoint == 0x5C) { | ||
string = '\\\\'; | ||
} | ||
else if ( | ||
codePoint == 0x24 || | ||
(codePoint >= 0x28 && codePoint <= 0x2B) || | ||
codePoint == 0x2D || codePoint == 0x2E || codePoint == 0x3F || | ||
(codePoint >= 0x5B && codePoint <= 0x5E) || | ||
(codePoint >= 0x7B && codePoint <= 0x7D) | ||
) { | ||
// [a-zA-Z0-9] | ||
// The code point maps to an unsafe printable ASCII character; | ||
// backslash-escape it. Here’s the list of those symbols: | ||
// | ||
// $()*+-.?[\]^{|} | ||
// | ||
// See #7 for more info. | ||
string = '\\' + stringFromCharCode(codePoint); | ||
} | ||
else if (codePoint >= 0x20 && codePoint <= 0x7E) { | ||
// The code point maps to one of these printable ASCII symbols | ||
// (including the space character): | ||
// | ||
// !"#%&',/0123456789:;<=>@ABCDEFGHIJKLMNO | ||
// PQRSTUVWXYZ_`abcdefghijklmnopqrstuvwxyz~ | ||
// | ||
// These can safely be used directly. | ||
string = stringFromCharCode(codePoint); | ||
} else if (codePoint <= 0xFF) { | ||
} | ||
else if (codePoint <= 0xFF) { | ||
// http://mathiasbynens.be/notes/javascript-escapes#hexadecimal | ||
string = '\\x' + pad(hex(codePoint), 2); | ||
} else { // if (codePoint <= 0xFFFF) | ||
} | ||
else { // `codePoint <= 0xFFFF` holds true. | ||
// http://mathiasbynens.be/notes/javascript-escapes#unicode | ||
@@ -291,212 +587,382 @@ string = '\\u' + pad(hex(codePoint), 4); | ||
// Based on `punycode.ucs2.decode`: http://mths.be/punycode | ||
var symbolToCodePoint = function(symbol) { | ||
var length = symbol.length; | ||
var value = symbol.charCodeAt(0); | ||
var extra; | ||
if ((value & 0xF800) == 0xD800 && length > 1) { | ||
// `value` is a high surrogate, and there is a next character — assume | ||
// it’s a low surrogate (else it’s invalid use of Regenerate anyway). | ||
extra = symbol.charCodeAt(1); | ||
return ((value & 0x3FF) << 10) + (extra & 0x3FF) + 0x10000; | ||
} else { | ||
return value; | ||
var first = symbol.charCodeAt(0); | ||
var second; | ||
if ( | ||
first >= HIGH_SURROGATE_MIN && first <= HIGH_SURROGATE_MAX && | ||
length > 1 // There is a next code unit. | ||
) { | ||
// `first` is a high surrogate, and there is a next character. Assume | ||
// it’s a low surrogate (else it’s invalid usage of Regenerate anyway). | ||
second = symbol.charCodeAt(1); | ||
// http://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae | ||
return (first - HIGH_SURROGATE_MIN) * 0x400 + | ||
second - LOW_SURROGATE_MIN + 0x10000; | ||
} | ||
return first; | ||
}; | ||
var createBMPCharacterClasses = function(codePoints) { | ||
var tmp = ''; | ||
var start = codePoints[0]; | ||
var end = start; | ||
var predict = start + 1; | ||
codePoints = codePoints.slice(1); | ||
var counter = 0; | ||
forEach(codePoints, function(code) { | ||
if (predict == code) { | ||
end = code; | ||
predict = code + 1; | ||
return; | ||
} | ||
var createBMPCharacterClasses = function(data) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var result = ''; | ||
var index = 0; | ||
var start; | ||
var end; | ||
var length = data.length; | ||
if (dataIsSingleton(data)) { | ||
return codePointToString(data[0]); | ||
} | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1] - 1; // Note: the `- 1` makes `end` inclusive. | ||
if (start == end) { | ||
tmp += codePointToString(start); | ||
counter += 1; | ||
} else if (end == start + 1) { | ||
tmp += codePointToString(start) + codePointToString(end); | ||
counter += 2; | ||
result += codePointToString(start); | ||
} else if (start + 1 == end) { | ||
result += codePointToString(start) + codePointToString(end); | ||
} else { | ||
tmp += codePointToString(start) + '-' + codePointToString(end); | ||
counter += 2; | ||
result += codePointToString(start) + '-' + codePointToString(end); | ||
} | ||
start = code; | ||
end = code; | ||
predict = code + 1; | ||
}); | ||
if (start == end) { | ||
tmp += codePointToString(start); | ||
counter += 1; | ||
} else if (end == start + 1) { | ||
tmp += codePointToString(start) + codePointToString(end); | ||
counter += 2; | ||
} else { | ||
tmp += codePointToString(start) + '-' + codePointToString(end); | ||
counter += 2; | ||
index += 2; | ||
} | ||
return '[' + result + ']'; | ||
}; | ||
if (counter == 1) { | ||
return tmp; | ||
} else { | ||
return '[' + tmp + ']'; | ||
var splitAtBMP = function(data) { | ||
// Iterate over the data per `(start, end)` pair. | ||
var loneHighSurrogates = []; | ||
var bmp = []; | ||
var astral = []; | ||
var index = 0; | ||
var start; | ||
var end; | ||
var length = data.length; | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1] - 1; // Note: the `- 1` makes `end` inclusive. | ||
if (start <= 0xFFFF && end <= 0xFFFF) { | ||
// Both `start` and `end` are within the BMP range. | ||
if (start >= HIGH_SURROGATE_MIN && start <= HIGH_SURROGATE_MAX) { | ||
// `start` lies in the high surrogates range. | ||
if (end <= HIGH_SURROGATE_MAX) { | ||
loneHighSurrogates.push(start, end + 1); | ||
} else { | ||
loneHighSurrogates.push(start, HIGH_SURROGATE_MAX + 1); | ||
bmp.push(HIGH_SURROGATE_MAX + 1, end + 1); | ||
} | ||
} else if (end >= HIGH_SURROGATE_MIN && end <= HIGH_SURROGATE_MAX) { | ||
bmp.push(start, HIGH_SURROGATE_MIN); | ||
loneHighSurrogates.push(HIGH_SURROGATE_MIN, end + 1); | ||
} else if (start < HIGH_SURROGATE_MIN && end > HIGH_SURROGATE_MAX) { | ||
bmp.push(start, HIGH_SURROGATE_MIN, HIGH_SURROGATE_MAX + 1, end + 1); | ||
loneHighSurrogates.push(HIGH_SURROGATE_MIN, HIGH_SURROGATE_MAX + 1); | ||
} else { | ||
bmp.push(start, end + 1); | ||
} | ||
} | ||
else if (start <= 0xFFFF && end > 0xFFFF) { | ||
// `start` is in the BMP range, but `end` lies within the astral range. | ||
if (start >= HIGH_SURROGATE_MIN && start <= HIGH_SURROGATE_MAX) { | ||
// `start` lies in the high surrogates range. Since `end` is astral, | ||
// we can just add all high surrogates starting from `start` to | ||
// `loneHighSurrogates`, any other BMP code points to `bmp`, and the | ||
// remaining symbols to `astral`. | ||
loneHighSurrogates.push(start, HIGH_SURROGATE_MAX + 1); | ||
bmp.push(HIGH_SURROGATE_MAX + 1, 0xFFFF + 1); | ||
} else if (start < HIGH_SURROGATE_MIN) { | ||
bmp.push(start, HIGH_SURROGATE_MIN, HIGH_SURROGATE_MAX + 1, 0xFFFF + 1); | ||
loneHighSurrogates.push(HIGH_SURROGATE_MIN, HIGH_SURROGATE_MAX + 1); | ||
} else { // `start > HIGH_SURROGATE_MAX` holds true. | ||
bmp.push(start, 0xFFFF + 1); | ||
} | ||
astral.push(0xFFFF + 1, end + 1); | ||
} | ||
else { | ||
// Both `start` and `end` are in the astral range. | ||
astral.push(start, end + 1); | ||
} | ||
index += 2; | ||
} | ||
return { | ||
'loneHighSurrogates': loneHighSurrogates, | ||
'bmp': bmp, | ||
'astral': astral | ||
}; | ||
}; | ||
// In Regenerate output, `\0` will never be preceded by `\` because we sort | ||
// by code point value, so let’s keep this regular expression simple: | ||
var regexNull = /\\x00([^0123456789]|$)/g; | ||
var createCharacterClasses = function(codePoints) { | ||
// At this point, it’s safe to assume `codePoints` is a sorted array of | ||
// numeric code point values. | ||
var bmp = []; | ||
var astralMap = {}; | ||
var surrogates = []; | ||
var hasAstral = false; | ||
var optimizeSurrogateMappings = function(surrogateMappings) { | ||
var result = []; | ||
var tmpLow = []; | ||
var addLow = false; | ||
var mapping; | ||
var nextMapping; | ||
var highSurrogates; | ||
var lowSurrogates; | ||
var nextHighSurrogates; | ||
var nextLowSurrogates; | ||
var index = -1; | ||
var length = surrogateMappings.length; | ||
while (++index < length) { | ||
mapping = surrogateMappings[index]; | ||
nextMapping = surrogateMappings[index + 1]; | ||
if (!nextMapping) { | ||
result.push(mapping); | ||
continue; | ||
} | ||
highSurrogates = mapping[0]; | ||
lowSurrogates = mapping[1]; | ||
nextHighSurrogates = nextMapping[0]; | ||
nextLowSurrogates = nextMapping[1]; | ||
forEach(codePoints, function(codePoint) { | ||
if (codePoint >= 0xD800 && codePoint <= 0xDBFF) { | ||
// If a high surrogate is followed by a low surrogate, the two code | ||
// units should be matched together, so that the regex always matches a | ||
// full code point. For this reason, separate code points that are | ||
// (unmatched) high surrogates are tracked separately, so they can be | ||
// moved to the end if astral symbols are to be matched as well. | ||
surrogates.push(codePoint); | ||
} else if (codePoint >= 0x0000 && codePoint <= 0xFFFF) { | ||
// non-surrogate BMP code point | ||
bmp.push(codePoint); | ||
} else if (codePoint >= 0x010000 && codePoint <= 0x10FFFF) { | ||
// astral code point | ||
hasAstral = true; | ||
append( | ||
astralMap, | ||
highSurrogate(codePoint), | ||
lowSurrogate(codePoint) | ||
); | ||
} else { | ||
throw RangeError('Invalid code point value. Code points range from ' + | ||
'U+000000 to U+10FFFF.'); | ||
// Check for identical high surrogate ranges. | ||
tmpLow = lowSurrogates; | ||
while ( | ||
nextHighSurrogates && | ||
highSurrogates[0] == nextHighSurrogates[0] && | ||
highSurrogates[1] == nextHighSurrogates[1] | ||
) { | ||
// Merge with the next item. | ||
if (dataIsSingleton(nextLowSurrogates)) { | ||
tmpLow = dataAdd(tmpLow, nextLowSurrogates[0]); | ||
} else { | ||
tmpLow = dataAddRange( | ||
tmpLow, | ||
nextLowSurrogates[0], | ||
nextLowSurrogates[1] - 1 | ||
); | ||
} | ||
++index; | ||
mapping = surrogateMappings[index]; | ||
highSurrogates = mapping[0]; | ||
lowSurrogates = mapping[1]; | ||
nextMapping = surrogateMappings[index + 1]; | ||
nextHighSurrogates = nextMapping && nextMapping[0]; | ||
nextLowSurrogates = nextMapping && nextMapping[1]; | ||
addLow = true; | ||
} | ||
}); | ||
result.push([ | ||
highSurrogates, | ||
addLow ? tmpLow : lowSurrogates | ||
]); | ||
addLow = false; | ||
} | ||
return optimizeByLowSurrogates(result); | ||
}; | ||
var astralMapByLowRanges = {}; | ||
forOwn(astralMap, function(highSurrogate, lowSurrogate) { | ||
var bmpRange = createBMPCharacterClasses(lowSurrogate); | ||
append(astralMapByLowRanges, bmpRange, +highSurrogate); | ||
}); | ||
var tmp = []; | ||
// If we’re not dealing with any astral symbols, there’s no need to move | ||
// individual code points that are high surrogates to the end of the regex. | ||
if (!hasAstral && surrogates.length) { | ||
bmp = sortUniqueNumbers(bmp.concat(surrogates)); | ||
var optimizeByLowSurrogates = function(surrogateMappings) { | ||
if (surrogateMappings.length == 1) { | ||
return surrogateMappings; | ||
} | ||
if (bmp.length) { | ||
tmp.push(createBMPCharacterClasses(bmp)); | ||
var index = -1; | ||
var innerIndex = -1; | ||
while (++index < surrogateMappings.length) { | ||
var mapping = surrogateMappings[index]; | ||
var lowSurrogates = mapping[1]; | ||
var lowSurrogateStart = lowSurrogates[0]; | ||
var lowSurrogateEnd = lowSurrogates[1]; | ||
innerIndex = index; // Note: the loop starts at the next index. | ||
while (++innerIndex < surrogateMappings.length) { | ||
var otherMapping = surrogateMappings[innerIndex]; | ||
var otherLowSurrogates = otherMapping[1]; | ||
var otherLowSurrogateStart = otherLowSurrogates[0]; | ||
var otherLowSurrogateEnd = otherLowSurrogates[1]; | ||
if ( | ||
lowSurrogateStart == otherLowSurrogateStart && | ||
lowSurrogateEnd == otherLowSurrogateEnd | ||
) { | ||
// Add the code points in the other item to this one. | ||
if (dataIsSingleton(otherMapping[0])) { | ||
mapping[0] = dataAdd(mapping[0], otherMapping[0][0]); | ||
} else { | ||
mapping[0] = dataAddRange( | ||
mapping[0], | ||
otherMapping[0][0], | ||
otherMapping[0][1] - 1 | ||
); | ||
} | ||
// Remove the other, now redundant, item. | ||
surrogateMappings.splice(innerIndex, 1); | ||
--innerIndex; | ||
} | ||
} | ||
} | ||
forOwn(astralMapByLowRanges, function(lowSurrogate, highSurrogate) { | ||
tmp.push(createBMPCharacterClasses(highSurrogate) + lowSurrogate); | ||
}); | ||
// Individual code points that are high surrogates must go at the end | ||
// if astral symbols are to be matched as well. | ||
if (hasAstral && surrogates.length) { | ||
tmp.push(createBMPCharacterClasses(surrogates)); | ||
} | ||
return tmp | ||
.join('|') | ||
// Use `\0` instead of `\x00` where possible | ||
.replace(regexNull, '\\0$1'); | ||
return surrogateMappings; | ||
}; | ||
var fromCodePoints = function(codePoints) { | ||
if (!isArray(codePoints)) { | ||
throw TypeError('fromCodePoints(): The `codePoints` argument must be ' + | ||
'an array.'); | ||
var surrogateSet = function(data) { | ||
// Exit early if `data` is an empty set. | ||
if (!data.length) { | ||
return { | ||
'highSurrogatesData': [], | ||
'surrogateMappings': [] | ||
}; | ||
} | ||
if (!codePoints.length) { | ||
return ''; | ||
} | ||
// Iterate over the data per `(start, end)` pair. | ||
var index = 0; | ||
var start; | ||
var end; | ||
var startHigh; | ||
var startLow; | ||
var prevStartHigh = 0; | ||
var prevEndHigh = 0; | ||
var tmpLow = []; | ||
var endHigh; | ||
var endLow; | ||
var highSurrogatesData = []; | ||
var surrogateMappings = []; | ||
var length = data.length; | ||
var dataHigh = []; | ||
while (index < length) { | ||
start = data[index]; | ||
end = data[index + 1] - 1; | ||
codePoints = sortUniqueNumbers(codePoints); | ||
startHigh = highSurrogate(start); | ||
startLow = lowSurrogate(start); | ||
endHigh = highSurrogate(end); | ||
endLow = lowSurrogate(end); | ||
return createCharacterClasses(codePoints); | ||
}; | ||
var startsWithLowestLowSurrogate = startLow == LOW_SURROGATE_MIN; | ||
var endsWithHighestLowSurrogate = endLow == LOW_SURROGATE_MAX; | ||
var complete = false; | ||
var fromCodePointRange = function(start, end) { | ||
return createCharacterClasses(range(start, end)); | ||
}; | ||
// Append the previous high-surrogate-to-low-surrogate mappings. | ||
// Step 1: `(startHigh, startLow)` to `(startHigh, LOW_SURROGATE_MAX)`. | ||
if ( | ||
startHigh == endHigh || | ||
startsWithLowestLowSurrogate && endsWithHighestLowSurrogate | ||
) { | ||
highSurrogatesData = dataAddRange( | ||
highSurrogatesData, | ||
startHigh, | ||
endHigh | ||
); | ||
surrogateMappings.push([ | ||
[startHigh, endHigh + 1], | ||
[startLow, endLow + 1] | ||
]); | ||
complete = true; | ||
} else { | ||
highSurrogatesData = dataAdd( | ||
highSurrogatesData, | ||
startHigh | ||
); | ||
surrogateMappings.push([ | ||
[startHigh, startHigh + 1], | ||
[startLow, LOW_SURROGATE_MAX + 1] | ||
]); | ||
} | ||
var fromCodePointRanges = function(codePointRanges) { | ||
if (!isArray(codePointRanges)) { | ||
throw TypeError('fromCodePointRanges(): The `ranges` argument must be ' + | ||
'an array.'); | ||
} | ||
// Step 2: `(startHigh + 1, LOW_SURROGATE_MIN)` to | ||
// `(endHigh - 1, LOW_SURROGATE_MAX)`. | ||
if (!complete && startHigh + 1 < endHigh) { | ||
if (endsWithHighestLowSurrogate) { | ||
// Combine step 2 and step 3. | ||
highSurrogatesData = dataAddRange( | ||
highSurrogatesData, | ||
startHigh + 1, | ||
endHigh | ||
); | ||
surrogateMappings.push([ | ||
[startHigh + 1, endHigh + 1], | ||
[LOW_SURROGATE_MIN, endLow + 1] | ||
]); | ||
complete = true; | ||
} else { | ||
highSurrogatesData = dataAddRange( | ||
highSurrogatesData, | ||
startHigh + 1, | ||
endHigh - 1 | ||
); | ||
surrogateMappings.push([ | ||
[startHigh + 1, endHigh], | ||
[LOW_SURROGATE_MIN, LOW_SURROGATE_MAX + 1] | ||
]); | ||
} | ||
} | ||
if (!codePointRanges.length) { | ||
return ''; | ||
} | ||
// Step 3. `(endHigh, LOW_SURROGATE_MIN)` to `(endHigh, endLow)`. | ||
if (!complete) { | ||
highSurrogatesData = dataAdd( | ||
highSurrogatesData, | ||
endHigh | ||
); | ||
surrogateMappings.push([ | ||
[endHigh, endHigh + 1], | ||
[LOW_SURROGATE_MIN, endLow + 1] | ||
]); | ||
} | ||
return createCharacterClasses(ranges(codePointRanges)); | ||
}; | ||
prevStartHigh = startHigh; | ||
prevEndHigh = endHigh; | ||
var fromSymbols = function(symbols) { | ||
if (!isArray(symbols)) { | ||
throw TypeError('fromSymbols(): The `symbols` argument must be an ' + | ||
'array.'); | ||
index += 2; | ||
} | ||
if (!symbols.length) { | ||
return ''; | ||
} | ||
return { | ||
'highSurrogatesData': highSurrogatesData, | ||
'surrogateMappings': optimizeSurrogateMappings(surrogateMappings) | ||
// The format of `surrogateMappings` is as follows: | ||
// | ||
// [ surrogateMapping1, surrogateMapping2 ] | ||
// | ||
// i.e.: | ||
// | ||
// [ | ||
// [ highSurrogates1, lowSurrogates1 ], | ||
// [ highSurrogates2, lowSurrogates2 ] | ||
// ] | ||
}; | ||
}; | ||
var codePoints = map(symbols, symbolToCodePoint); | ||
// Sort code points numerically | ||
codePoints = codePoints.sort(function(a, b) { | ||
return a - b; | ||
var createSurrogateCharacterClasses = function(surrogateMappings) { | ||
var result = []; | ||
forEach(surrogateMappings, function(surrogateMapping) { | ||
var highSurrogates = surrogateMapping[0]; | ||
var lowSurrogates = surrogateMapping[1]; | ||
result.push( | ||
createBMPCharacterClasses(highSurrogates) + | ||
createBMPCharacterClasses(lowSurrogates) | ||
); | ||
}); | ||
return createCharacterClasses(codePoints); | ||
return result.join('|'); | ||
}; | ||
var fromSymbolRange = function(start, end) { | ||
return createCharacterClasses( | ||
range(symbolToCodePoint(start), symbolToCodePoint(end)) | ||
); | ||
}; | ||
var createCharacterClassesFromData = function(data) { | ||
var result = []; | ||
var fromSymbolRanges = function(symbolRanges) { | ||
if (!isArray(symbolRanges)) { | ||
throw TypeError('fromSymbolRanges(): The `ranges` argument must be an ' + | ||
'array.'); | ||
var parts = splitAtBMP(data); | ||
var loneHighSurrogates = parts.loneHighSurrogates; | ||
var bmp = parts.bmp; | ||
var astral = parts.astral; | ||
var hasAstral = !dataIsEmpty(parts.astral); | ||
var hasLoneSurrogates = !dataIsEmpty(loneHighSurrogates); | ||
var surrogatesData = surrogateSet(astral); | ||
var highSurrogatesData = surrogatesData.highSurrogatesData; | ||
var surrogateMappings = surrogatesData.surrogateMappings; | ||
// If we’re not dealing with any astral symbols, there’s no need to move | ||
// individual code points that are high surrogates to the end of the regex. | ||
if (!hasAstral && hasLoneSurrogates) { | ||
bmp = dataAddData(bmp, loneHighSurrogates); | ||
} | ||
if (!symbolRanges.length) { | ||
return ''; | ||
if (!dataIsEmpty(bmp)) { | ||
// The data set contains BMP code points that are not high surrogates | ||
// needed for astral code points in the set. | ||
result.push(createBMPCharacterClasses(bmp)); | ||
} | ||
var codePoints = []; | ||
forEach(symbolRanges, function(symbolRange) { | ||
// If it’s a single symbol (not a range) | ||
if (!isArray(symbolRange)) { | ||
codePoints.push(symbolToCodePoint(symbolRange)); | ||
return; | ||
} | ||
// If it’s a range (not a single code point) | ||
var start = symbolToCodePoint(symbolRange[0]); | ||
var stop = symbolToCodePoint(symbolRange[1]); | ||
codePoints = codePoints.concat(range(start, stop)); | ||
}); | ||
return createCharacterClasses(codePoints); | ||
if (surrogateMappings.length) { | ||
// The data set contains astral code points; append character classes | ||
// based on their surrogate pairs. | ||
result.push(createSurrogateCharacterClasses(surrogateMappings)); | ||
} | ||
if (hasAstral && hasLoneSurrogates) { | ||
// The data set contains lone high surrogates; append these. Lone high | ||
// surrogates must go at the end of the regex if astral symbols are to be | ||
// matched as well. | ||
result.push(createBMPCharacterClasses(loneHighSurrogates)); | ||
} | ||
return result.join('|'); | ||
}; | ||
@@ -506,67 +972,129 @@ | ||
var Set = function(value) { | ||
this.__codePoints__ = []; | ||
return this; | ||
// `regenerate` can be used as a constructor (and new methods can be added to | ||
// its prototype) but also as a regular function, the latter of which is the | ||
// documented and most common usage. For that reason, it’s not capitalized. | ||
var regenerate = function(value) { | ||
if (arguments.length > 1) { | ||
value = slice.call(arguments); | ||
} | ||
if (this instanceof regenerate) { | ||
this.__data__ = []; | ||
return value ? this.add(value) : this; | ||
} | ||
return (new regenerate).add(value); | ||
}; | ||
var proto = Set.prototype; | ||
var proto = regenerate.prototype; | ||
extend(proto, { | ||
'add': function(value) { | ||
var $this = this; | ||
if (value == null) { | ||
return $this; | ||
} | ||
if (value instanceof regenerate) { | ||
// Allow passing other Regenerate instances. | ||
$this.__data__ = dataAddData($this.__data__, value.__data__); | ||
return $this; | ||
} | ||
if (arguments.length > 1) { | ||
value = slice.call(arguments); | ||
} | ||
this.__codePoints__ = add(this.__codePoints__, value); | ||
return this; | ||
if (isArray(value)) { | ||
forEach(value, function(item) { | ||
$this.add(item); | ||
}); | ||
return $this; | ||
} | ||
$this.__data__ = dataAdd( | ||
$this.__data__, | ||
isNumber(value) ? value : symbolToCodePoint(value) | ||
); | ||
return $this; | ||
}, | ||
'remove': function(value) { | ||
var $this = this; | ||
if (value == null) { | ||
return $this; | ||
} | ||
if (value instanceof regenerate) { | ||
// Allow passing other Regenerate instances. | ||
$this.__data__ = dataRemoveData($this.__data__, value.__data__); | ||
return $this; | ||
} | ||
if (arguments.length > 1) { | ||
value = slice.call(arguments); | ||
} | ||
this.__codePoints__ = remove(this.__codePoints__, value); | ||
return this; | ||
if (isArray(value)) { | ||
forEach(value, function(item) { | ||
$this.remove(item); | ||
}); | ||
return $this; | ||
} | ||
$this.__data__ = dataRemove( | ||
$this.__data__, | ||
isNumber(value) ? value : symbolToCodePoint(value) | ||
); | ||
return $this; | ||
}, | ||
'addRange': function(start, end) { | ||
this.__codePoints__ = add(this.__codePoints__, range( | ||
var $this = this; | ||
$this.__data__ = dataAddRange($this.__data__, | ||
isNumber(start) ? start : symbolToCodePoint(start), | ||
isNumber(end) ? end : symbolToCodePoint(end) | ||
)); | ||
return this; | ||
); | ||
return $this; | ||
}, | ||
'removeRange': function(start, end) { | ||
var $this = this; | ||
var startCodePoint = isNumber(start) ? start : symbolToCodePoint(start); | ||
var endCodePoint = isNumber(end) ? end : symbolToCodePoint(end); | ||
var array = []; | ||
forEach(this.__codePoints__, function(codePoint) { | ||
if (codePoint < startCodePoint || codePoint > endCodePoint) { | ||
array.push(codePoint); | ||
} | ||
}); | ||
this.__codePoints__ = array; | ||
return this; | ||
$this.__data__ = dataRemoveRange( | ||
$this.__data__, | ||
startCodePoint, | ||
endCodePoint | ||
); | ||
return $this; | ||
}, | ||
'difference': function(array) { | ||
this.__codePoints__ = difference(this.__codePoints__, array); | ||
return this; | ||
'difference': function(argument) { | ||
var $this = this; | ||
// Allow passing other Regenerate instances. TODO: Optimize this by | ||
// writing and using `dataDifferenceData()` here when appropriate. | ||
var array = argument instanceof regenerate ? | ||
dataToArray(argument.__data__) : | ||
argument; | ||
$this.__data__ = dataDifference($this.__data__, array); | ||
// TODO: allow non-code point values (i.e. strings or arrays) here? | ||
return $this; | ||
}, | ||
'intersection': function(array) { | ||
this.__codePoints__ = intersection(this.__codePoints__, array); | ||
return this; | ||
'intersection': function(argument) { | ||
var $this = this; | ||
// Allow passing other Regenerate instances. | ||
// TODO: Optimize this by writing and using `dataIntersectionData()`. | ||
var array = argument instanceof regenerate ? | ||
dataToArray(argument.__data__) : | ||
argument; | ||
$this.__data__ = dataIntersection($this.__data__, array); | ||
return $this; | ||
}, | ||
'contains': function(codePoint) { | ||
return contains( | ||
this.__codePoints__, | ||
return dataContains( | ||
this.__data__, | ||
isNumber(codePoint) ? codePoint : symbolToCodePoint(codePoint) | ||
); | ||
}, | ||
'clone': function() { | ||
var set = new regenerate; | ||
set.__data__ = this.__data__.slice(0); | ||
return set; | ||
}, | ||
'toString': function() { | ||
this.__codePoints__ = sortUniqueNumbers(this.__codePoints__); | ||
return createCharacterClasses(this.__codePoints__); | ||
var result = createCharacterClassesFromData(this.__data__); | ||
// Use `\0` instead of `\x00` where possible. | ||
return result.replace(regexNull, '\\0$1'); | ||
}, | ||
'toRegExp': function() { | ||
this.__codePoints__ = sortUniqueNumbers(this.__codePoints__); | ||
return RegExp(createCharacterClasses(this.__codePoints__)); | ||
return RegExp(this.toString()); | ||
}, | ||
'valueOf': function() { // has alias `toArray` | ||
this.__codePoints__ = sortUniqueNumbers(this.__codePoints__); | ||
return this.__codePoints__; | ||
'valueOf': function() { // Note: `valueOf` is aliased as `toArray`. | ||
return dataToArray(this.__data__); | ||
} | ||
@@ -577,31 +1105,4 @@ }); | ||
var set = function(value) { | ||
if (value instanceof Set) { | ||
// this is already a set; don’t wrap it again | ||
return value; | ||
} else if (arguments.length > 1) { | ||
value = slice.call(arguments); | ||
} | ||
return (new Set).add(value); | ||
}; | ||
regenerate.version = '0.6.0'; | ||
extend(set, { | ||
'version': '0.5.4', | ||
'fromCodePoints': fromCodePoints, | ||
'fromCodePointRange': fromCodePointRange, | ||
'fromCodePointRanges': fromCodePointRanges, | ||
'fromSymbols': fromSymbols, | ||
'fromSymbolRange': fromSymbolRange, | ||
'fromSymbolRanges': fromSymbolRanges, | ||
'range': range, | ||
'ranges': ranges, | ||
'contains': contains, | ||
'difference': difference, | ||
'intersection': intersection, | ||
'add': add, | ||
'remove': remove | ||
}); | ||
var regenerate = set; | ||
// Some AMD build optimizers, like r.js, check for specific condition patterns | ||
@@ -608,0 +1109,0 @@ // like the following: |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
Major refactor
Supply chain riskPackage has recently undergone a major refactor. It may be unstable or indicate significant internal changes. Use caution when updating to versions that include significant changes.
Found 1 instance in 1 package
Dynamic require
Supply chain riskDynamic require can indicate the package is performing dangerous or unsafe dynamic code execution.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
398593
12
1718
336
1
1