Natural Language Toolkit
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Thai Natural Language Processing library
An accurate natural language detection library, suitable for short text and mixed-language text
Comprehensive collection of 160+ tree-sitter language parsers
Textile processing for python.
Microsoft Azure Text Analytics Client Library for Python
Python package for Korean natural language processing.
Natural language processing augmentation library for deep neural networks
Generalist model for NER (Extract any entity types from texts)
Extract quantities from unstructured text.
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense chunking library
Functions to preprocess and normalize text.
Module for automatic summarization of text documents and HTML pages.
A modern, type-safe Python library for converting HTML to Markdown with comprehensive tag support and customizable options
uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.
Python library for processing Chinese text
NeMo text processing for ASR and TTS
A text summarization and keyword extraction package based on TextRank
Nonsense String Evaluator
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
NLP, before and after spaCy
Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats
A library for extracting abbreviations from text.
Identification and conversion functions for Chinese text processing
Natural Language Processing (NLP) library for Urdu language.
A Python library for a _FULL_ Zalgo experience
Python ctypes bindings for reliq
Wrappers for several pre-processing scripts from the Moses toolkit.
A fast Voice Activity Detection and Transcription System
A base class for wrapping text-processing tools
Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract
Onnx Text Recognition (OnnxTR) OCR plugin for docling
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
A command to manage a header section for a source code tree
Unsupervised Korean Natural Language Processing Toolkits
pre-processing package for text strings
STAM is a library for dealing with standoff annotations on text, this is the python binding.
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
Aspose.PSD for Python via .NET is a standalone API to read, write, process, convert Adobe Photoshop PSD, PSB formats without needing to install Adobe Photoshop® and AI files without Adobe Illustrator®
A text-to-intent parsing framework.
('Core libraries for natural language processing',)