Unsupervised Korean Natural Language Processing Toolkits
An augmentation library based on SpaCy for joint augmentation of text and labels.
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
Phrase Tree from Natural Language Toolkit
Text2Text Language Modeling Toolkit
Effortless LLM extraction from documents
Text processing with pandas DataFrames.
Tools, wrappers, etc... for data science with a concentration on text processing
A text-to-intent parsing framework.
A neural network intent parser
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.
SONATA: SOund and Narrative Advanced Transcription Assistant
Accurately extract complete noun phrases with customisation and strctural output.
A library for calculating a variety of features from text using spaCy
An AI-powered tool to clean manga panels.
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
A text-to-intent parsing framework.
Open-source tool for exploring, labeling, and monitoring data for NLP projects.
Preprocessing and Extraction of Linguistic Information for Computational Analysis
HuSpaCy: industrial strength Hungarian natural language processing
('Core libraries for natural language processing',)
Melusine is a high-level library for emails processing
A python wrapper for the Doc2X API and comes with native texts processing (to improve texts recall in RAG).
Aspose.PSD for Python via .NET is a standalone API to read, write, process, convert Adobe Photoshop PSD, PSB formats without needing to install Adobe Photoshop® and AI files without Adobe Illustrator®
Wrappers for including pre-trained transformers in spaCy pipelines
A python module implementing the Rapid Automatic Keyword Extraction algorithm.
Breame is a lightweight Python package with a number of tools to aid in the detection of words that have dual spellings and meanings in British and American English.
Generalist model for Relation Extraction (Extract any relation types from texts)
an extensible tool to process legal citations in text
BENT: Biomedical Entity Annotator
Library designed as a python wrapper to unleash Rust text processing power combined with Python
Parser for dependency trees
Text processing tool for detecting Danish CPR-numbers.
Process and profile text datasets interactively
Parses unstructured recipe ingredient text into standardized quantities, units, and foods
Flesch Kincaid readability scoring algorithm
HuSpaCy: industrial strength Hungarian natural language processing
A python package for text preprocessing task in natural language processing
An AI-powered tool to clean manga panels.
This module, part of the `abstract_essentials` package, provides a collection of utility functions for working with images and PDFs, including loading and saving images, extracting text from images, capturing screenshots, processing PDFs, and more.
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
A neural network intent parser
Extracts the Machine Readable Zone (MRZ) data from document images
A package for extracting keywords from large text very quickly (much faster than regex and the original flashtext package
Simple Text-Processing and -Analytics Command Line Tool made in Python.
A Python library for detecting and censoring profanity in text