Extracting arrays of words and optionally numbers from strings. For Node.js and the browser. When you need more than just [a-z]. Part of document processing for search-index and nowsearch.xyz.

Inspired by extractwords

Initiating

Node.js

const wnn = require('words-n-numbers')
// wnn available

Browser

<script src="wnn.js"></script>

<script>
  //wnn available
</script>

Use

The default regex should catch every unicode character from for every language.

Only words

let stringOfWords = 'A 1000000 dollars baby!'
wnn.extract(stringOfWords)
// returns ['A', 'dollars', 'baby']

Only words, converted to lowercase

let stringOfWords = 'A 1000000 dollars baby!'
wnn.extract(stringOfWords, { toLowercase: true })
// returns ['a', 'dollars', 'baby']

Predefined regex for words and numbers, converted to lowercase

let stringOfWords = 'A 1000000 dollars baby!'
wnn.extract(stringOfWords, { regex: wnn.wordsAndNumbers, toLowercase: true })
// returns ['a', '1000000', 'dollars', 'baby']

Custom regex

let stringOfWords = 'This happens at 5 o\'clock !!!'
wnn.extract(stringOfWords, { regex: '[a-z\'0-9]+' })
// returns ['This', 'happens', 'at', '5', 'o\'clock']

API

Extract function

Returns an array of words and optionally numbers.

wnn.extract(stringOfText, \<options-object\>)

Options object

{
  regex: '[custom or predefined regex]',  // defaults to wnn.words
  toLowercase: [true / false]             // defaults to false
}

Predefined regex'es

wnn.words            // only words, any language <-- default
wnn.numbers          // only numbers, any language
wnn.wordsAndNumbers  // words and numbers, any language

Languages supported

Supports all languages supported by stopword, and more. Some languages like Japanese and Chinese simplified needs to be tokenized. May add tokenizers at a later stage.

PR's welcome

PR's and issues are more than welcome =)

Keywords

FAQs

What is words-n-numbers?

Is words-n-numbers popular?

Is words-n-numbers well maintained?

Last updated on 25 Apr 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

words-n-numbers

Words'n'numbers