Natural Language Toolkit
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Natural language processing augmentation library for deep neural networks
Python package for Korean natural language processing.
An accurate natural language detection library, suitable for short text and mixed-language text
Thai Natural Language Processing library
Microsoft Azure Text Analytics Client Library for Python
Textile processing for python.
Module for automatic summarization of text documents and HTML pages.
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
Extract quantities from unstructured text.
Generalist model for NER (Extract any entity types from texts)
Functions to preprocess and normalize text.
NeMo text processing for ASR and TTS
Blazing-fast Thai text processing library powered by Rust
Python library for processing Chinese text
Natural Language Processing (NLP) library for Urdu language.
A library for extracting abbreviations from text.
Identification and conversion functions for Chinese text processing
NLP, before and after spaCy
A command to manage a header section for a source code tree
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
A text summarization and keyword extraction package based on TextRank
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
pre-processing package for text strings
Wrappers for several pre-processing scripts from the Moses toolkit.
STAM is a library for dealing with standoff annotations on text, this is the python binding.
uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.
A base class for wrapping text-processing tools
Nonsense String Evaluator
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding.
A Python library for a _FULL_ Zalgo experience
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
A library for standardizing terms with spelling variations using a synonym dictionary.
Unsupervised Korean Natural Language Processing Toolkits
A library for augmenting text for natural language processing applications.
Get descriptions of images from OpenAI, Azure OpenAI, and Anthropic Claude models with support for local files and batch processing.
Text2Text Language Modeling Toolkit
Snips Natural Language Understanding library
Text processing with pandas DataFrames.
A library for calculating a variety of features from text using spaCy
Phrase Tree from Natural Language Toolkit