Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

wink-tokenizer

Package Overview
Dependencies
Maintainers
3
Versions
19
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

wink-tokenizer - npm Package Compare versions

Comparing version 2.0.1 to 2.1.0

9

package.json
{
"name": "wink-tokenizer",
"version": "2.0.1",
"description": "Versatile tokenizer that automatically tags each token with its type",
"version": "2.1.0",
"description": "Multilingual tokenizer that automatically tags each token with its type",
"keywords": [

@@ -16,2 +16,7 @@ "Tokenizer",

"Emoticon",
"Multilingual",
"French",
"German",
"Spanish",
"Icelandic",
"wink"

@@ -18,0 +23,0 @@ ],

# wink-tokenizer
Versatile tokenizer that automatically tags each token with its type
Multilingual tokenizer that automatically tags each token with its type
### [![Build Status](https://api.travis-ci.org/winkjs/wink-tokenizer.svg?branch=master)](https://travis-ci.org/winkjs/wink-tokenizer) [![Coverage Status](https://coveralls.io/repos/github/winkjs/wink-tokenizer/badge.svg?branch=master)](https://coveralls.io/github/winkjs/wink-tokenizer?branch=master) [![devDependencies Status](https://david-dm.org/winkjs/wink-tokenizer/dev-status.svg)](https://david-dm.org/winkjs/wink-tokenizer?type=dev)
### [![Build Status](https://api.travis-ci.org/winkjs/wink-tokenizer.svg?branch=master)](https://travis-ci.org/winkjs/wink-tokenizer) [![Coverage Status](https://coveralls.io/repos/github/winkjs/wink-tokenizer/badge.svg?branch=master)](https://coveralls.io/github/winkjs/wink-tokenizer?branch=master) [![Inline docs](http://inch-ci.org/github/winkjs/wink-tokenizer.svg?branch=master)](http://inch-ci.org/github/winkjs/wink-tokenizer) [![devDependencies Status](https://david-dm.org/winkjs/wink-tokenizer/dev-status.svg)](https://david-dm.org/winkjs/wink-tokenizer?type=dev)
[<img align="right" src="https://decisively.github.io/wink-logos/logo-title.png" width="100px" >](http://winkjs.org/)
Tokenize sentences and also automatically tag each token as either word, email, twitter handle, or more using **`wink-tokenizer`**. It is a part of [wink](http://winkjs.org/) — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.
Tokenize sentences in English, French, German, Spanish, and Icelandic using **`wink-tokenizer`**. It is a part of [wink](http://winkjs.org/) — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.
It automatically tags each token as either word, email, twitter handle, or more.
### Installation

@@ -17,3 +19,3 @@

### Example
### Getting Started
```javascript

@@ -24,3 +26,4 @@ // Load tokenizer.

var myTokenizer = tokenizer();
// Just tokenize the sentence...
// Tokenize a tweet.
var s = '@superman: hit me up on my email r2d2@gmail.com, 2 of us plan party🎉 tom at 3pm:) #fun';

@@ -49,2 +52,12 @@ myTokenizer.tokenize( s );

// { value: '#fun', tag: 'hashtag' } ]
// Tokenize a french sentence.
s = 'Mieux vaut prévenir que guérir:-)';
myTokenizer.tokenize( s );
// -> [ { value: 'Mieux', tag: 'word' },
// { value: 'vaut', tag: 'word' },
// { value: 'prévenir', tag: 'word' },
// { value: 'que', tag: 'word' },
// { value: 'guérir', tag: 'word' },
// { value: ':-)', tag: 'emoticon' } ]
```

@@ -61,5 +74,5 @@

**wink-tokenizer** is copyright 2017 [GRAYPE Systems Private Limited](http://graype.in/).
**wink-tokenizer** is copyright 2017-18 [GRAYPE Systems Private Limited](http://graype.in/).
It is licensed under the under the terms of the GNU Affero General Public License as published by the Free
Software Foundation, version 3 of the License.
// wink-tokenizer
// Versatile tokenizer that automatically tags each token with its type.
// Multilingual tokenizer that automatically tags each token with its type.
//

@@ -36,3 +36,4 @@ // Copyright (C) 2017 GRAYPE Systems Private Limited

var rgxTime = /(?:\d|[01]\d|2[0-3]):?(?:[0-5][0-9])?\s?(?:[ap]m|hours|hrs)\b/gi;
var rgxWord = /[a-z]+\'[a-z]{1,2}|[a-z]+s\'|[a-z]+/gi;
// Inlcude [Latin-1 Supplement Unicode Block](https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block))
var rgxWord = /[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+\'[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]{1,2}|[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+s\'|[a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]+/gi;
// Special regex to handle not elisions at sentence level itself.

@@ -39,0 +40,0 @@ var rgxNotElision = /([a-z])(n\'t)\b/gi;

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc