Socket
Book a DemoInstallSign in
Socket

@shelf/text-normalizer

Package Overview
Dependencies
Maintainers
59
Versions
8
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@shelf/text-normalizer

Text normalizer initially done for openai/whisper but ported to TS with love by shelf.io!

2.0.1
latest
npmnpm
Version published
Weekly downloads
18
-92.14%
Maintainers
59
Weekly downloads
 
Created
Source

text-normalizer CircleCI

Originally took from openai/whisperer and rewrote to TS

TypeScript library for normalizing English text. It provides a utility class EnglishTextNormalizer with methods for normalizing various types of text, such as contractions, abbreviations, and spacing. EnglishTextNormalizer consists of other classes you can reuse independently:

  • EnglishSpellingNormalizer - uses a dictionary of English words and their American spelling. The dictionary is stored in a JSON file named english.json
  • EnglishNumberNormalizer - works specifically to normalize text from English words to actually numbers
  • BasicTextNormalizer - provides methods for removing special characters and diacritics from text, as well as splitting words into separate letters.

Install

$ yarn add @shelf/text-normalizer

Usage

Node.js

import {EnglishTextNormalizer} from '@shelf/text-normalizer';

const normalizer = new EnglishTextNormalizer();

console.log(normalizer.normalize("Let's")); // Output: let us
console.log(normalizer.normalize("he's like")); // Output: he is like
console.log(normalizer.normalize("she's been like")); // Output: she has been like
console.log(normalizer.normalize('10km')); // Output: 10 km
console.log(normalizer.normalize('10mm')); // Output: 10 mm
console.log(normalizer.normalize('RC232')); // Output: rc 232
console.log(normalizer.normalize('Mr. Park visited Assoc. Prof. Kim Jr.')); // Output: mister park visited associate professor kim junior

Browser

import {EnglishTextNormalizer} from 'https://esm.sh/@shelf/text-normalizer';

const normalizer = new EnglishTextNormalizer();

console.log(normalizer.normalize("Let's")); // Output: let us
console.log(normalizer.normalize("he's like!")); // Output: he is like

Advanced Usage

Using EnglishNumberNormalizer

import {EnglishNumberNormalizer} from '@shelf/text-normalizer';

const numberNormalizer = new EnglishNumberNormalizer();

console.log(numberNormalizer.normalize('twenty-five')); // Output: 25
console.log(numberNormalizer.normalize('three million')); // Output: 3000000
console.log(numberNormalizer.normalize('two and a half')); // Output: 2.5
console.log(numberNormalizer.normalize('fifty percent')); // Output: 50%

Using EnglishSpellingNormalizer

import {EnglishSpellingNormalizer} from '@shelf/text-normalizer';

const spellingNormalizer = new EnglishSpellingNormalizer();

console.log(spellingNormalizer.normalize('colour')); // Output: color
console.log(spellingNormalizer.normalize('organise')); // Output: organize

Using BasicTextNormalizer

import {BasicTextNormalizer} from '@shelf/text-normalizer';

const basicNormalizer = new BasicTextNormalizer(true, true);

console.log(basicNormalizer.normalize('Café!')); // Output: c a f e
console.log(basicNormalizer.normalize('Hello [World]')); // Output: h e l l o

Configuration

BasicTextNormalizer

The BasicTextNormalizer constructor accepts two optional boolean parameters:

  • removeDiacritics (default: false): If set to true, diacritics will be removed from the text.
  • splitLetters (default: false): If set to true, letters will be split into individual characters.

Example:

const normalizer = new BasicTextNormalizer(true, true);

Performance Considerations

  • The EnglishTextNormalizer combines multiple normalization techniques and may be slower for very large texts. Consider using individual normalizers (EnglishNumberNormalizer, EnglishSpellingNormalizer, or BasicTextNormalizer) if you only need specific functionality.
  • For repeated normalization of large amounts of text, consider initializing the normalizer once and reusing it to avoid unnecessary setup time.
  • compromise - Natural language processing in JavaScript

Publish

$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags

License

MIT © Shelf

FAQs

Package last updated on 07 Oct 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.