Natural Language Toolkit
Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
Natural language processing augmentation library for deep neural networks
Module for automatic summarization of text documents and HTML pages.
NeMo text processing for ASR and TTS
Thai Natural Language Processing library
An accurate natural language detection library, suitable for short text and mixed-language text
Python package for Korean natural language processing.
Microsoft Azure Text Analytics Client Library for Python
Textile processing for python.
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
A library for extracting abbreviations from text.
Extract quantities from unstructured text.
Functions to preprocess and normalize text.
Generalist model for NER (Extract any entity types from texts)
Blazing-fast Thai text processing library powered by Rust
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
Python library for processing Chinese text
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
Identification and conversion functions for Chinese text processing
A command to manage a header section for a source code tree
NLP, before and after spaCy
STAM is a library for dealing with standoff annotations on text, this is the python binding.
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding.
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
A text summarization and keyword extraction package based on TextRank
Natural Language Processing (NLP) library for Urdu language.
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
pre-processing package for text strings
an extensible tool to process legal citations in text
Wrappers for several pre-processing scripts from the Moses toolkit.
Nonsense String Evaluator
A Python library for a _FULL_ Zalgo experience
Text2Text Language Modeling Toolkit
Python bindings for MeTA
A base class for wrapping text-processing tools
A text-to-intent parsing framework.
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.
BENT: Biomedical Entity Annotator
Simple Text-Processing and -Analytics Command Line Tool made in Python.
HuSpaCy: industrial strength Hungarian natural language processing
An AI-powered tool to clean manga panels.
Interface with various cloud APIs for language processing such as translation, text to speech