wink-tokenizer
Advanced tools
Comparing version 2.2.0 to 2.3.0
{ | ||
"name": "wink-tokenizer", | ||
"version": "2.2.0", | ||
"version": "2.3.0", | ||
"description": "Multilingual tokenizer that automatically tags each token with its type", | ||
@@ -5,0 +5,0 @@ "keywords": [ |
@@ -39,2 +39,4 @@ // wink-tokenizer | ||
var rgxWord = /[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+\'[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]{1,2}|[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+s\'|[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+/gi; | ||
// Symbols go here. | ||
var rgxSymbol = /[\~\@\#\%\^\+\=\*\|<>&]/g; | ||
// Special regex to handle not elisions at sentence level itself. | ||
@@ -56,3 +58,4 @@ var rgxNotElision = /([a-z])(n\'t)\b/gi; | ||
{ regex: rgxWord, category: 'word' }, | ||
{ regex: rgxPunctuation, category: 'punctuation' } | ||
{ regex: rgxPunctuation, category: 'punctuation' }, | ||
{ regex: rgxSymbol, category: 'symbol' } | ||
]; | ||
@@ -69,2 +72,3 @@ // Used to generate finger print from the tokens. | ||
currency: 'r', | ||
// symbol: 's', | ||
time: 't', | ||
@@ -200,3 +204,3 @@ url: 'u', | ||
* @param {object} config — It defines 0 or more properties from the list of | ||
* **12** properties. A true value for a property ensures tokenization | ||
* **13** properties. A true value for a property ensures tokenization | ||
* for that type of text; whereas false value will mean that the tokenization of that | ||
@@ -221,11 +225,12 @@ * type of text will not be attempted. | ||
* @param {boolean} [config.quoted_phrase=true] any **"quoted text"** in the sentence. (**`q`**) | ||
* @param {boolean} [config.symbol=true] for example **`~`** or **`+`** or **`&`** or **`%`** ( token becomes fingerprint ) | ||
* @param {boolean} [config.time=true] common representation of time such as **4pm** or **16:00 hours** (**`t`**) | ||
* @param {boolean} [config.mention=true] **@mention** as in github or twitter (**`m`**) | ||
* @param {boolean} [config.url=true] URL such as **https://github.com** (**`u`**) | ||
* @param {boolean} [config.word=true] word such as **faster** or **dog's** or **cats'** (**`w`**) | ||
* @return {number} number of properties set to true from the list of above 12. | ||
* @param {boolean} [config.word=true] word such as **faster** or **résumé** or **prévenir** (**`w`**) | ||
* @return {number} number of properties set to true from the list of above 13. | ||
* @example | ||
* // Do not tokenize & tag @mentions. | ||
* var myTokenizer.defineConfig( { mention: false } ); | ||
* // -> 11 | ||
* // -> 12 | ||
* // Only tokenize words as defined above. | ||
@@ -232,0 +237,0 @@ * var myTokenizer.defineConfig( {} ); |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
50699
299