Words'n'numbers
Extracting arrays of words and optionally numbers from strings. For Node.js and the browser. When you need more than just [a-z]. Part of document processing for search-index and nowsearch.xyz.
Inspired by extractwords
Initiating
Node.js
const wnn = require('words-n-numbers')
Browser
<script src="wnn.js"></script>
<script>
</script>
Use
The default regex should catch every unicode character from for every language.
Only words
let stringOfWords = 'A 1000000 dollars baby!'
wnn.extract(stringOfWords)
Only words, converted to lowercase
let stringOfWords = 'A 1000000 dollars baby!'
wnn.extract(stringOfWords, { toLowercase: true })
Predefined regex for words and numbers, converted to lowercase
let stringOfWords = 'A 1000000 dollars baby!'
wnn.extract(stringOfWords, { regex: wnn.wordsAndNumbers, toLowercase: true })
Custom regex
let stringOfWords = 'This happens at 5 o\'clock !!!'
wnn.extract(stringOfWords, { regex: '[a-z\'0-9]+' })
API
Returns an array of words and optionally numbers.
wnn.extract(stringOfText, \<options-object\>)
Options object
{
regex: '[custom or predefined regex]',
toLowercase: [true / false]
}
Predefined regex'es
wnn.words
wnn.numbers
wnn.wordsAndNumbers
Languages supported
Supports all languages supported by stopword, and more. Some languages like Japanese and Chinese simplified needs to be tokenized. May add tokenizers at a later stage.
PR's welcome
PR's and issues are more than welcome =)