What is regenerate-unicode-properties?
The regenerate-unicode-properties package is a tool for generating JavaScript regular expressions based on Unicode properties. It is used to create sets of Unicode symbols that match certain criteria, such as belonging to a particular category, script, or block. This can be useful for text processing tasks that require matching characters with specific Unicode attributes.
What are regenerate-unicode-properties's main functionalities?
Generating regex sets for Unicode categories
This feature allows you to generate regular expressions that match any character from a specific Unicode category, such as letters (category 'L').
const regenerate = require('regenerate-unicode-properties');
const regex = regenerate().addCategory('L').toRegExp();
// regex will match any Unicode letter
Generating regex sets for Unicode scripts
This feature enables the creation of regular expressions that match characters from a specific Unicode script, such as Hiragana.
const regenerate = require('regenerate-unicode-properties');
const regex = regenerate().addScript('Hiragana').toRegExp();
// regex will match any character in the Hiragana script
Generating regex sets for Unicode blocks
With this feature, you can generate regular expressions that match characters within a specific Unicode block, like the Basic Latin block.
const regenerate = require('regenerate-unicode-properties');
const regex = regenerate().addBlock('Basic_Latin').toRegExp();
// regex will match any character in the Basic Latin block
Other packages similar to regenerate-unicode-properties
unicode-regex
The unicode-regex package allows you to create regular expressions based on Unicode properties, similar to regenerate-unicode-properties. It provides a higher-level API for composing regular expressions using Unicode property escapes.
xregexp
XRegExp is an extended JavaScript regex library that includes support for additional syntax and features, such as named capture groups and Unicode property escapes. It offers similar functionality for matching Unicode properties but also includes a wide range of other enhancements to regular expressions.
regexpu-core
regexpu-core is a library that compiles Unicode-aware regular expressions to ES5. It transforms Unicode property escapes and other Unicode-related regex features to work in older environments. It provides similar Unicode property matching capabilities but focuses on compatibility with non-ES2015 environments.
regenerate-unicode-properties
regenerate-unicode-properties is a collection of Regenerate sets for various Unicode properties.
Installation
To use regenerate-unicode-properties programmatically, install it as a dependency via npm:
$ npm install regenerate-unicode-properties
Usage
To get a map of supported properties and their values:
const properties = require('regenerate-unicode-properties');
To get a specific Regenerate set:
const Lu = require('regenerate-unicode-properties/General_Category/Uppercase_Letter.js').characters;
const Greek = require('regenerate-unicode-properties/Script_Extensions/Greek.js').characters;
Some properties can also refer to strings rather than single characters:
const { characters, strings } = require('regenerate-unicode-properties/Property_of_Strings/Basic_Emoji.js');
To get the Unicode version the data was based on:
const unicodeVersion = require('regenerate-unicode-properties/unicode-version.js');
For maintainers
How to publish a new release
-
On the main
branch, bump the version number in package.json
:
npm version patch -m 'Release v%s'
Instead of patch
, use minor
or major
as needed.
Note that this produces a Git commit + tag.
-
Push the release commit and tag:
git push && git push --tags
Our CI then automatically publishes the new release to npm.
Author
License
regenerate-unicode-properties is available under the MIT license.