Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

nlpaug

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

nlpaug

Natural language processing augmentation library for deep neural networks

  • 1.1.11
  • PyPI
  • Socket score

Maintainers
1



Build Code Quality Downloads

nlpaug

This python library helps you with augmenting nlp for your machine learning projects. Visit this introduction to understand about Data Augmentation in NLP. Augmenter is the basic element of augmentation while Flow is a pipeline to orchestra multi augmenter together.

Features

  • Generate synthetic data for improving model performance without manual effort
  • Simple, easy-to-use and lightweight library. Augment data in 3 lines of code
  • Plug and play to any machine leanring/ neural network frameworks (e.g. scikit-learn, PyTorch, TensorFlow)
  • Support textual and audio input

Textual Data Augmentation Example


Acoustic Data Augmentation Example


SectionDescription
Quick DemoHow to use this library
AugmenterIntroduce all available augmentation methods
InstallationHow to install this library
Recent ChangesLatest enhancement
Extension ReadingMore real life examples or researchs
ReferenceReference of external resources such as data or model

Quick Demo

Augmenter

AugmenterTargetAugmenterActionDescription
TextualCharacterKeyboardAugsubstituteSimulate keyboard distance error
TextualOcrAugsubstituteSimulate OCR engine error
TextualRandomAuginsert, substitute, swap, deleteApply augmentation randomly
TextualWordAntonymAugsubstituteSubstitute opposite meaning word according to WordNet antonym
TextualContextualWordEmbsAuginsert, substituteFeeding surroundings word to BERT, DistilBERT, RoBERTa or XLNet language model to find out the most suitlabe word for augmentation
TextualRandomWordAugswap, crop, deleteApply augmentation randomly
TextualSpellingAugsubstituteSubstitute word according to spelling mistake dictionary
TextualSplitAugsplitSplit one word to two words randomly
TextualSynonymAugsubstituteSubstitute similar word according to WordNet/ PPDB synonym
TextualTfIdfAuginsert, substituteUse TF-IDF to find out how word should be augmented
TextualWordEmbsAuginsert, substituteLeverage word2vec, GloVe or fasttext embeddings to apply augmentation
TextualBackTranslationAugsubstituteLeverage two translation models for augmentation
TextualReservedAugsubstituteReplace reserved words
TextualSentenceContextualWordEmbsForSentenceAuginsertInsert sentence according to XLNet, GPT2 or DistilGPT2 prediction
TextualAbstSummAugsubstituteSummarize article by abstractive summarization method
TextualLambadaAugsubstituteUsing language model to generate text and then using classification model to retain high quality results
SignalAudioCropAugdeleteDelete audio's segment
SignalLoudnessAugsubstituteAdjust audio's volume
SignalMaskAugsubstituteMask audio's segment
SignalNoiseAugsubstituteInject noise
SignalPitchAugsubstituteAdjust audio's pitch
SignalShiftAugsubstituteShift time dimension forward/ backward
SignalSpeedAugsubstituteAdjust audio's speed
SignalVtlpAugsubstituteChange vocal tract
SignalNormalizeAugsubstituteNormalize audio
SignalPolarityInverseAugsubstituteSwap positive and negative for audio
SignalSpectrogramFrequencyMaskingAugsubstituteSet block of values to zero according to frequency dimension
SignalTimeMaskingAugsubstituteSet block of values to zero according to time dimension
SignalLoudnessAugsubstituteAdjust volume

Flow

AugmenterAugmenterDescription
PipelineSequentialApply list of augmentation functions sequentially
PipelineSometimesApply some augmentation functions randomly

Installation

The library supports python 3.5+ in linux and window platform.

To install the library:

pip install numpy requests nlpaug

or install the latest version (include BETA features) from github directly

pip install numpy git+https://github.com/makcedward/nlpaug.git

or install over conda

conda install -c makcedward nlpaug

If you use BackTranslationAug, ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug, installing the following dependencies as well

pip install torch>=1.6.0 transformers>=4.11.3 sentencepiece

If you use LambadaAug, installing the following dependencies as well

pip install simpletransformers>=0.61.10

If you use AntonymAug, SynonymAug, installing the following dependencies as well

pip install nltk>=3.4.5

If you use WordEmbsAug (word2vec, glove or fasttext), downloading pre-trained model first and installing the following dependencies as well

from nlpaug.util.file.download import DownloadUtil
DownloadUtil.download_word2vec(dest_dir='.') # Download word2vec model
DownloadUtil.download_glove(model_name='glove.6B', dest_dir='.') # Download GloVe model
DownloadUtil.download_fasttext(model_name='wiki-news-300d-1M', dest_dir='.') # Download fasttext model

pip install gensim>=4.1.2

If you use SynonymAug (PPDB), downloading file from the following URI. You may not able to run the augmenter if you get PPDB file from other website

http://paraphrase.org/#/download

If you use PitchAug, SpeedAug and VtlpAug, installing the following dependencies as well

pip install librosa>=0.9.1 matplotlib

Recent Changes

1.1.11 Jul 6, 2022

See changelog for more details.

Extension Reading

Reference

This library uses data (e.g. capturing from internet), research (e.g. following augmenter idea), model (e.g. using pre-trained model) See data source for more details.

Citation

@misc{ma2019nlpaug,
  title={NLP Augmentation},
  author={Edward Ma},
  howpublished={https://github.com/makcedward/nlpaug},
  year={2019}
}

This package is cited by many books, workshop and academic research papers (70+). Here are some of examples and you may visit here to get the full list.

Workshops cited nlpaug

Book cited nlpaug

Research paper cited nlpaug

Contributions


sakares saengkaew


Binoy Dalal


Emrecan Çelik

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc