🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Demo Install Sign in

Demo Install Sign in

pypi
Categories
Server
Text Processing

Text Processing

twat-llm

LLM integration for twat

text-processing

ruts

Russian Texts Statistics

yurenizer

A library for standardizing terms with spelling variations using a synonym dictionary.

text-processing

karhu

An AI assistant with PDF processing, web browsing, and speech capabilities

textfromimage

Get descriptions of images from OpenAI, Azure OpenAI, and Anthropic Claude models with support for local files and batch processing.

computer-vision

awkg

awkg is an awk-like text-processing tool powered by python language

img-characterize

Convert images to character art with support for multiple character sets and formats

image processing

image conversion

long2short

A flexible text summarization library to summarize long documents supporting multiple LLM providers

text-processing

avesta

A Python library for text processing

belt-nlp

BELT (BERT For Longer Texts). BERT-based text classification model for processing texts longer than 512 tokens.

natural-language-processing

text-classification

transfer-learning

morethansentiments

An NLP python package for computing Boilerplate score and many other text features.

Natural Language Processing

biliscribe

A MCP Server that extracts and formats Bilibili video content into structured text, optimized for LLM processing and analysis.

thai-nner

Thai Nested Named Entity Recognition

natural language processing

text processing

texy

Supercharge text processing

text-processing-ml

A library for processing text for machine learning

bkit

A python tool for Bangla text processing

bangla text processing

shallow parsing

stark-engine

S.T.A.R.K - Speech and Text Algorithmic Recognition Kit. Modern framework for creating powerfull voice assistants.

natural-language-processing

natural-language

spam-detection-system

A sophisticated spam detection system using Multinomial Naive Bayes classifier trained on labeled emails. The system processes text through a machine learning pipeline that converts raw text into numerical features for accurate classification.

wordcloud-fa

A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.

kathairo

A text processing pipeline for Scripture.

mdify

A powerful tool to extract text, tables, charts, and formulas from documents and convert them into Markdown format, ideal to improve LLM's accuracy and for versatile document processing.

ragql

This project implements a local-first RAG chat system that reads and processes various text-based log files. It splits the content into manageable chunks, generates embeddings using Ollama or OpenAI, and allows users to interactively query the logs for specific information. The application features a customizable response format and supports configuration for user preferences.

charboundary

Fast character-based boundary detection for sentence and paragraphs

text segmentation

sentence boundary detection

paragraph detection

text processing

pre-processing-text-basic-tools-br

Kit de ferramentas para processos básicos de Processamento de Linguagem Natural.

wordwright

A package for text processing

sockit

Sockit is a natural-language processing toolkit for modeling structured occupation information and Standard Occupational Classification (SOC) codes in unstructured text from job titles, job postings, and resumes.

shifterator

Interpretable data visualizations for understanding how texts differ at the word level

natural language processing

sentiment analysis

information theory

computational social socience

digital humanities

atai-whisper-tool

OpenAI Whisper with Apple MPS support

text-extraction

audio-processing

textextraction

Extract and process text from images and PDFs

text extraction

image processing

document processing

table detection

tamilkavi

A command-line tool for exploring Tamil Kavithaigal.

text processing

atai-ebook-tool

A command-line tool for parsing ebooks (such as EPUB and MOBI) and converting them into a structured JSON file.

text-extraction

document-processing

dsutils

data science utils for data preprocessing for feeding various models, pipelining, time data format converting

natual language processing

dspeech

A Speech-to-Text toolkit with VAD, punctuation, and emotion classification

speech processing

speech recognition

speech synthesis

nlp-text-cleaner

Clean the text for NLP project

natural-language-processing

text-preprocessing

gca-analyzer

A package for Group Conversation Analysis with improved text processing and visualization

pandoctools

Profile manager of text processing pipelines: Pandoc filters, any text CLI filters. Atom+Markdown+Pandoc+Jupyter workflow, export to ipynb.

artless-template

Artless and small template library for server-side rendering.

artless-template

template engine

text processing

kuzukiri

Natural Language Processing

Text Segmentation

iambic

Data extraction and rendering library for Shakespearean text.

text-processing

sabhi-utils

Self rolled utils to be used with the Sabhi ML Services

image utils text processing

gerbolyze

A high-resolution image-to-PCB converter. Gerbolyze plots SVG, PNG and JPG onto existing gerber files. It handles almost the full SVG spec and deals with text, path outlines, patterns, arbitrary paths with self-intersections and holes, etc. fully automatically. It can vectorize raster images both by contour tracing and by grayscale dithering. All processing is done at the vector level without intermediate conversions to raster images accurately preserving the input.

thai2transformers

Pretraining transformer based Thai language models

natural language processing

text processing

mathtext

Natural Language Understanding (text processing) for math symbols, digits, and words with a Gradio user interface and REST API.

text-processing-util-mds24

Utility functions for text processing.

my-collection

A Python package for text processing and argument parsing

pint-lib

A package for automated processing of (pubmed) text with LLM

itnpy2

A simple, deterministic, and extensible approach to inverse text normalization for numbers

inverse text normalization

natural language processing

speech recognition

nlpiper

NLPiper, a lightweight package integrated with a universe of frameworks to pre-process documents.

natural language processing

computational linguistics

pydata-wrangler

Wrangle messy data into pandas DataFrames, with a special focus on text data and natural language processing

pydata-wrangler

natural language processing

clusx

Bayesian nonparametric toolkit for text clustering, analysis, and benchmarking with advanced embedding models and statistical validation.

natural-language-processing

machine-learning

dirichlet-process

Product

Package Alerts
Integrations
Docs
Pricing
FAQ
Roadmap
Changelog

About

About
Love
Blog
Glossary
Discord Community
CareersHiring
Send Feedback
Contact Us
System Status

Packages

npm

Directory
Explore
Random Package
Most Popular
Top Maintainers
Removed Packages

Go

Directory
Explore
Random Package

Maven

Directory
Explore
Random Package

NuGet

Directory
Explore
Random Package

PyPI

Directory
Explore
Random Package

Rubygems

Directory
Explore
Random Package

Stay in touch

Get open source security insights delivered straight into your inbox.

Enter your email

Terms
Privacy
Security

Made with ⚡️ by Socket Inc