Shared Python library for internal microservices
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
Phrase Tree from Natural Language Toolkit
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. This fork is specialized for IndicTrans2.
An augmentation library based on SpaCy for joint augmentation of text and labels.
Python bindings for MeTA
Text processing with pandas DataFrames.
Aspose.PSD for Python via .NET is a standalone API to read, write, process, convert Adobe Photoshop PSD, PSB formats without needing to install Adobe Photoshop® and AI files without Adobe Illustrator®
A library for augmenting text for natural language processing applications.
A simple AI toolkit for text processing using OpenAI and Gemini APIs
A text-to-intent parsing framework.
A neural network intent parser
Effortless LLM extraction from documents
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
Library designed as a python wrapper to unleash Rust text processing power combined with Python
Breame is a lightweight Python package with a number of tools to aid in the detection of words that have dual spellings and meanings in British and American English.
HuSpaCy: industrial strength Hungarian natural language processing
an extensible tool to process legal citations in text
fenic is a Python DataFrame library for processing text data with APIs inspired by PySpark.
A full api for hdporncomics
A python module implementing the Rapid Automatic Keyword Extraction algorithm.
A neural network intent parser
A package for extracting keywords from large text very quickly (much faster than regex and the original flashtext package
Generalist model for Relation Extraction (Extract any relation types from texts)
A powerful MCP server for comprehensive PDF processing with OCR and diagram detection
Open-source tool for exploring, labeling, and monitoring data for NLP projects.
A python package for text preprocessing task in natural language processing
A minimalist collection of text processing tools for Python 3
A GUI tool to extract and process currency values from text
A library for calculating a variety of features from text using spaCy
BENT: Biomedical Entity Annotator
Melusine is a high-level library for emails processing
GATE NLP implementation in Python.
A powerful Python library for intelligent text processing, question generation, and answer generation for LLM fine-tuning datasets
Process and profile text datasets interactively
Snips Natural Language Understanding library
ArchiTXT is a tool for structuring textual data into a valid database model. It is guided by a meta-grammar and uses an iterative process of tree rewriting.
Tools, wrappers, etc... for data science with a concentration on text processing
Powerful and Pythonic PDF processing library based on xpdf-4.02
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.
A fast HTML content extractor based on Mozilla's Readability.js
HuSpaCy: industrial strength Hungarian natural language processing
A library to filter and deduplicate Q&A text datasets from CSV files.
A Python library for detecting and censoring profanity in text
Toolkits for text processing and augmentation for Bangla NLP
Interface with various cloud APIs for language processing such as translation, text to speech