Natural Language Toolkit
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
An accurate natural language detection library, suitable for short text and mixed-language text
Thai Natural Language Processing library
Textile processing for python.
Microsoft Azure Text Analytics Client Library for Python
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
Natural language processing augmentation library for deep neural networks
Module for automatic summarization of text documents and HTML pages.
Extract quantities from unstructured text.
Extensive Language Pack for Tree-Sitter
Generalist model for NER (Extract any entity types from texts)
Python package for Korean natural language processing.
NeMo text processing for ASR and TTS
Functions to preprocess and normalize text.
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
NLP, before and after spaCy
Python library for processing Chinese text
A command to manage a header section for a source code tree
Natural Language Processing (NLP) library for Urdu language.
A text summarization and keyword extraction package based on TextRank
A library for extracting abbreviations from text.
Wrappers for several pre-processing scripts from the Moses toolkit.
A base class for wrapping text-processing tools
Identification and conversion functions for Chinese text processing
Nonsense String Evaluator
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
pre-processing package for text strings
uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.
A Python library for a _FULL_ Zalgo experience
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.
Open-source tool for exploring, labeling, and monitoring data for NLP projects.
STAM is a library for dealing with standoff annotations on text, this is the python binding.
Python ctypes bindings for reliq
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
An augmentation library based on SpaCy for joint augmentation of text and labels.
Simple Text-Processing and -Analytics Command Line Tool made in Python.
Text2Text Language Modeling Toolkit
Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.
an extensible tool to process legal citations in text
A neural network intent parser
The metadata and text content extractor for almost every file type.
Python bindings for MeTA