npm install compromise-numbers
const nlp = require('compromise')
nlp.extend(require('compromise-numbers'))
let doc = nlp('I’d like to request seventeen dollars for a push broom rebristling')
doc.numbers().debug()
API
Opinions:
if a number is changed within a sentence, attempts are made at sentence-agreement - in both a leading determiner, and the plurality of a following noun.
This is done safely, but it may have sneaky or unintended effects for some applications.
money, fractions, and percentages will be returned and work fine in .numbers()
, but can be isolated with .money()
, .fractions()
and .percentages()
Fractions
.fractions() will parse things like '1/3', 'one out of three', and 'one third'.
it will not pluck the fraction from the end of a number, like 'six and one third'. 'one third' will still have a #Fraction tag.
Things can get pretty crazy - and there are some human-ambiguous fractions like 'five hundred thousandths'. In these cases it tries its best.
Attempts are also made to avoid conversational fractions, like 'half time show' or dates like '3rd quarter 2020'.
Money
- ambiguous currencies: many currency symbols are re-used, for different countries. We try to make some safe assumptions about this. compromise-numbers assumes a naked
$
is USD, £
is GBP, ₩
is South Korean, and 'kr'
is Swedish Krona.
Configuring this should be possible in future versions.
- decimal currencies:
nlp('five cents').money().get(0)
will return 0.05
(like it should), but .numbers().get()
will return 5
. This is a tricky thing that we should solve, somehow.
Years and Time
times like 5pm
are parsed and handled by compromise-dates and are not returned by .numbers()
.
particularly, #Year
tags are applied to numbers in a delicate way.
Decimal seperators
compromise-numbers uses the period decimal point and supports comma as a thousands-seperator.
Some european or latin-american number formats like comma-decimals, or space-separated-thousands do not parse properly.
Serial numbers
attempts are made to ignore phone-numbers, postal-codes and credit-card numbers from .numbers()
results, but there may be numbers used in other ways that are not accounted for.
work in progress!
MIT