LLM integration for twat
Russian Texts Statistics
A library for standardizing terms with spelling variations using a synonym dictionary.
Get descriptions of images from OpenAI, Azure OpenAI, and Anthropic Claude models with support for local files and batch processing.
awkg is an awk-like text-processing tool powered by python language
Convert images to character art with support for multiple character sets and formats
A flexible text summarization library to summarize long documents supporting multiple LLM providers
A Python library for text processing
BELT (BERT For Longer Texts). BERT-based text classification model for processing texts longer than 512 tokens.
An NLP python package for computing Boilerplate score and many other text features.
A MCP Server that extracts and formats Bilibili video content into structured text, optimized for LLM processing and analysis.
Thai Nested Named Entity Recognition
Supercharge text processing
A library for processing text for machine learning
A python tool for Bangla text processing
S.T.A.R.K - Speech and Text Algorithmic Recognition Kit. Modern framework for creating powerfull voice assistants.
A sophisticated spam detection system using Multinomial Naive Bayes classifier trained on labeled emails. The system processes text through a machine learning pipeline that converts raw text into numerical features for accurate classification.
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
A text processing pipeline for Scripture.
A powerful tool to extract text, tables, charts, and formulas from documents and convert them into Markdown format, ideal to improve LLM's accuracy and for versatile document processing.
This project implements a local-first RAG chat system that reads and processes various text-based log files. It splits the content into manageable chunks, generates embeddings using Ollama or OpenAI, and allows users to interactively query the logs for specific information. The application features a customizable response format and supports configuration for user preferences.
Fast character-based boundary detection for sentence and paragraphs
Kit de ferramentas para processos básicos de Processamento de Linguagem Natural.
A package for text processing
Sockit is a natural-language processing toolkit for modeling structured occupation information and Standard Occupational Classification (SOC) codes in unstructured text from job titles, job postings, and resumes.
Interpretable data visualizations for understanding how texts differ at the word level
OpenAI Whisper with Apple MPS support
Extract and process text from images and PDFs
A command-line tool for exploring Tamil Kavithaigal.
A command-line tool for parsing ebooks (such as EPUB and MOBI) and converting them into a structured JSON file.
data science utils for data preprocessing for feeding various models, pipelining, time data format converting
A Speech-to-Text toolkit with VAD, punctuation, and emotion classification
Clean the text for NLP project
A package for Group Conversation Analysis with improved text processing and visualization
Profile manager of text processing pipelines: Pandoc filters, any text CLI filters. Atom+Markdown+Pandoc+Jupyter workflow, export to ipynb.
Artless and small template library for server-side rendering.
Data extraction and rendering library for Shakespearean text.
Self rolled utils to be used with the Sabhi ML Services
A high-resolution image-to-PCB converter. Gerbolyze plots SVG, PNG and JPG onto existing gerber files. It handles almost the full SVG spec and deals with text, path outlines, patterns, arbitrary paths with self-intersections and holes, etc. fully automatically. It can vectorize raster images both by contour tracing and by grayscale dithering. All processing is done at the vector level without intermediate conversions to raster images accurately preserving the input.
Pretraining transformer based Thai language models
Natural Language Understanding (text processing) for math symbols, digits, and words with a Gradio user interface and REST API.
Utility functions for text processing.
A Python package for text processing and argument parsing
A package for automated processing of (pubmed) text with LLM
A simple, deterministic, and extensible approach to inverse text normalization for numbers
NLPiper, a lightweight package integrated with a universe of frameworks to pre-process documents.
Wrangle messy data into pandas DataFrames, with a special focus on text data and natural language processing
Bayesian nonparametric toolkit for text clustering, analysis, and benchmarking with advanced embedding models and statistical validation.