Natural Language Toolkit
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Python package for Korean natural language processing.
Thai Natural Language Processing library
An accurate natural language detection library, suitable for short text and mixed-language text
Textile processing for python.
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
Microsoft Azure Text Analytics Client Library for Python
Natural language processing augmentation library for deep neural networks
Extract quantities from unstructured text.
Module for automatic summarization of text documents and HTML pages.
Generalist model for NER (Extract any entity types from texts)
NeMo text processing for ASR and TTS
Functions to preprocess and normalize text.
A library for extracting abbreviations from text.
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
STAM is a library for dealing with standoff annotations on text, this is the python binding.
Blazing-fast Thai text processing library powered by Rust
Python library for processing Chinese text
NLP, before and after spaCy
Identification and conversion functions for Chinese text processing
Natural Language Processing (NLP) library for Urdu language.
Wrappers for several pre-processing scripts from the Moses toolkit.
A base class for wrapping text-processing tools
Text2Text Language Modeling Toolkit
A text summarization and keyword extraction package based on TextRank
A command to manage a header section for a source code tree
pre-processing package for text strings
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
Nonsense String Evaluator
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
A Python library for a _FULL_ Zalgo experience
Python ctypes bindings for reliq
An AI-powered tool to clean manga panels.
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding.
A library for augmenting text for natural language processing applications.
Python bindings for MeTA
An augmentation library based on SpaCy for joint augmentation of text and labels.
an extensible tool to process legal citations in text
Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.
A library for calculating a variety of features from text using spaCy
Tools for organizing a collections of text for entity-centric stream processing.
Open-source tool for exploring, labeling, and monitoring data for NLP projects.