Natural Language Toolkit
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
An accurate natural language detection library, suitable for short text and mixed-language text
Thai Natural Language Processing library
Microsoft Azure Text Analytics Client Library for Python
Natural language processing augmentation library for deep neural networks
Python package for Korean natural language processing.
Textile processing for python.
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
Extract quantities from unstructured text.
Module for automatic summarization of text documents and HTML pages.
A library for extracting abbreviations from text.
Functions to preprocess and normalize text.
Generalist model for NER (Extract any entity types from texts)
Blazing-fast Thai text processing library powered by Rust
NeMo text processing for ASR and TTS
Python library for processing Chinese text
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding.
Identification and conversion functions for Chinese text processing
NLP, before and after spaCy
A text summarization and keyword extraction package based on TextRank
A command to manage a header section for a source code tree
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
pre-processing package for text strings
Natural Language Processing (NLP) library for Urdu language.
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
Wrappers for several pre-processing scripts from the Moses toolkit.
A base class for wrapping text-processing tools
Nonsense String Evaluator
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
A Python library for a _FULL_ Zalgo experience
uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.
Simple Text-Processing and -Analytics Command Line Tool made in Python.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
A text-to-intent parsing framework.
STAM is a library for dealing with standoff annotations on text, this is the python binding.
A library for standardizing terms with spelling variations using a synonym dictionary.
A neural network intent parser
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
BENT: Biomedical Entity Annotator
Unsupervised Korean Natural Language Processing Toolkits
Aspose.PSD for Python via .NET is a standalone API to read, write, process, convert Adobe Photoshop PSD, PSB formats without needing to install Adobe Photoshop® and AI files without Adobe Illustrator®
Text2Text Language Modeling Toolkit
Phrase Tree from Natural Language Toolkit