pypi
Categories
Server
Text Processing

Text Processing

paxter

Paxter is a document-first, text pre-processing mini-language toolchain, loosely inspired by @-expressions in Racket

frenchnlp

State of the art toolchain for natural language processing in French

text mining
npl
corpus
french

puristaa

Puristaa (Finnish for compress) - shared prefix compression of ordered string sequences.

compression
developer-tools
text-processing

minification-station

Designed to process and combine multiple files within a specified directory into a single output file.

automation
file processing
minification
text processing

piraye

A utility for normalizing persian, arabic and english texts

NLP
Natural Language Processing
Tokenizing
Normalization

pdf2textbox

A PDF-to-text converter based on pdfminer2

PDF pdfminer2 PDFconversion text-processing

linesieve

An unholy blend of grep, sed, awk, and Python.

grep
sed
awk
cli
command-line
terminal

slang

A structural approach to signal ML

sound recognition
machine learning
language
audio
signal processing
natural language processing

pre-processing-text-basic-tools-br

Kit de ferramentas para processos básicos de Processamento de Linguagem Natural.

extralit

Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.

literature-review
pdf-extraction
natural-language-processing
text-labeling
data-extraction
artificial-intelligence

bulk-ner

A no-strings inference implementation framework Named Entity Recognition (NER) service of wrapped AI models powered by AREkit and the related text-processing pipelines.

natural language processing
named entity recognition
ner

html-to-markdown

Convert HTML to markdown

beautifulsoup
converter
html
markdown
text-processing

This module, part of the `abstract_essentials` package, provides a collection of utility functions for working with images and PDFs, including loading and saving images, extracting text from images, capturing screenshots, processing PDFs, and more.

sodata

processing web text data for NLP LLM

text-processing-util-mds24

Utility functions for text processing.

artless-template

Artless and small template library for server-side rendering.

artless-template
template engine
text processing
utility

cmtt

A library for processing Code Mixed Text. Still in development!

bibleparser

Parse a mistranscribed dictated bible reference into a standard format

bible
parser
reference
passage
dictation
transcription

moverscore

MoverScore: Evaluating text generation with contextualized embeddings and earth mover distance

machine translation
evaluation
NLP
natural language processing
computational linguistics

retexto

Fast text processing

nlp-text-cleaner

Clean the text for NLP project

nlp
text cleaning
natural-language-processing
text-cleaning
text-preprocessing

roter

Rotate and combine tables (Danish: Roter og kombiner borde).

developer-tools
text-processing
validation
verification

text-toolbox

A simple text processing toolkit

text-alignment-tool

A tool for performing complex text alignment processes.

alignment
needleman
wunsch
pipeline

wellcomeml

Utilities for managing nlp models and for processing text-related data at the Wellcome Trust

tokenize-text

Tokenizing and processing text inputs with transformer models

tokenization
text-processing
nlp
transformers

uroman

uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.

NLP
computational linguistics
machine translation
natural language processing
romanization
string similarity

tpro

tpro processes transcripts from speech-to-text services and outputs to various formats.

pdfalign

pdfalign is a very simple tool to grid align extracted pdf text. This is useful for invoice table extraction or further processing with llms / rag systems

qaznltk

A package for working with Kazakh language text processing.

argilla-v1

Open-source tool for exploring, labeling, and monitoring data for NLP projects.

data-science
natural-language-processing
text-labeling
data-annotation
artificial-intelligence
knowledged-graph

po-meta-magic

A Python package for extracting and processing text from images.

gptj

A GPT-J api to use with python3 to generate text, blogs, code, and more (Note: Starting with version 3.0.7 the api is using the old domain again so there might be some issues with limits)