A no-strings inference implementation framework Named Entity Recognition (NER) service of wrapped AI models powered by AREkit and the related text-processing pipelines.
A Python library for processing Yiddish text
State of the art toolchain for natural language processing in French
A PDF-to-text converter based on pdfminer2
Pre-processing text in parallel for Keras in python.
Text processing library for russian languange
A python tool for Bangla text processing
Preprocessing and Extraction of Linguistic Information for Computational Analysis
Farsi Tools: Tools for processing Farsi (Persian) text
Convert images to character art with support for multiple character sets and formats
A library for processing Code Mixed Text. Still in development!
Framework to process 3 channels in one: Video, Audio & Text
Text filter designed to cleanse text of profanity and offensive language, specifically tailored for Ukrainian, Russian, and Surzhik.
Detect emotions in text.
Easy-to-use text representations extraction library based on the Transformers library.
koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
Utilities for managing nlp models and for processing text-related data at the Wellcome Trust
An Arabic text processing library intended for use in NLP applications.
Easy NLP library for Python
A Python library for readability and textual metrics analysis, supporting multiple languages.
HanDic package for installing via pip.
CLI tool for generating text from images using the Gemma 3 model.
An NLP python package for computing Boilerplate score and many other text features.
Utility library for analysis & (pre)processing of Yorùbá text
LLM integration for twat
Utilities for text processing tasks with Deep NLP
Linguistic Pattern Lab using spaCy
NK-HanDic package for installing via pip.
This project implements a local-first RAG chat system that reads and processes various text-based log files. It splits the content into manageable chunks, generates embeddings using Ollama or OpenAI, and allows users to interactively query the logs for specific information. The application features a customizable response format and supports configuration for user preferences.
Powered by patented artificial intelligence and machine learning algorithms, LEADTOOLS is a collection of comprehensive toolkits to integrate recognition, document, medical, imaging, and multimedia technologies into desktop, server, tablet, web and mobile solutions.
A structural approach to signal ML
A CLI tool to prepare code as context for AI assistants and other purposes.
Library designed to process text with various filter criteria
A REST API for running Large Language Models
A text-based adventure game engine with natural language processing
Thai Nested Named Entity Recognition
Sketch Grammar Explorer (Sketch Engine API wrapper)
TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed specifically for tokenizing Turkish texts.
An utility library for processing Vietnamese texts
textcleaner: text-data pre-processing library
pdfalign is a very simple tool to grid align extracted pdf text. This is useful for invoice table extraction or further processing with llms / rag systems
SONATA: SOund and Narrative Advanced Transcription Assistant
Natural Language Processing Utility Functions
('A tool to convert single or mass PDFs to datasets for', 'language analysis, including a toolbox of text and NLP pre-processing options')
Bedrock is a high-level text pre-processing API, written in Python and can run on NLTK or Spacy as its backends.
A text processing pipeline for Scripture.
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
A downloader for textual corpora, for use in digital humanities, corpus linguistics, and natural language processing.
A powerful tool to extract text, tables, charts, and formulas from documents and convert them into Markdown format, ideal to improve LLM's accuracy and for versatile document processing.