Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

wink-tokenizer

Package Overview
Dependencies
Maintainers
3
Versions
19
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

wink-tokenizer - npm Package Compare versions

Comparing version 2.2.0 to 2.3.0

2

package.json
{
"name": "wink-tokenizer",
"version": "2.2.0",
"version": "2.3.0",
"description": "Multilingual tokenizer that automatically tags each token with its type",

@@ -5,0 +5,0 @@ "keywords": [

@@ -39,2 +39,4 @@ // wink-tokenizer

var rgxWord = /[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+\'[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]{1,2}|[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+s\'|[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+/gi;
// Symbols go here.
var rgxSymbol = /[\~\@\#\%\^\+\=\*\|<>&]/g;
// Special regex to handle not elisions at sentence level itself.

@@ -56,3 +58,4 @@ var rgxNotElision = /([a-z])(n\'t)\b/gi;

{ regex: rgxWord, category: 'word' },
{ regex: rgxPunctuation, category: 'punctuation' }
{ regex: rgxPunctuation, category: 'punctuation' },
{ regex: rgxSymbol, category: 'symbol' }
];

@@ -69,2 +72,3 @@ // Used to generate finger print from the tokens.

currency: 'r',
// symbol: 's',
time: 't',

@@ -200,3 +204,3 @@ url: 'u',

* @param {object} config — It defines 0 or more properties from the list of
* **12** properties. A true value for a property ensures tokenization
* **13** properties. A true value for a property ensures tokenization
* for that type of text; whereas false value will mean that the tokenization of that

@@ -221,11 +225,12 @@ * type of text will not be attempted.

* @param {boolean} [config.quoted_phrase=true] any **"quoted text"** in the sentence. (**`q`**)
* @param {boolean} [config.symbol=true] for example **`~`** or **`+`** or **`&`** or **`%`** ( token becomes fingerprint )
* @param {boolean} [config.time=true] common representation of time such as **4pm** or **16:00 hours** (**`t`**)
* @param {boolean} [config.mention=true] **@mention** as in github or twitter (**`m`**)
* @param {boolean} [config.url=true] URL such as **https://github.com** (**`u`**)
* @param {boolean} [config.word=true] word such as **faster** or **dog's** or **cats'** (**`w`**)
* @return {number} number of properties set to true from the list of above 12.
* @param {boolean} [config.word=true] word such as **faster** or **résumé** or **prévenir** (**`w`**)
* @return {number} number of properties set to true from the list of above 13.
* @example
* // Do not tokenize & tag @mentions.
* var myTokenizer.defineConfig( { mention: false } );
* // -> 11
* // -> 12
* // Only tokenize words as defined above.

@@ -232,0 +237,0 @@ * var myTokenizer.defineConfig( {} );

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc