Paxter is a document-first, text pre-processing mini-language toolchain, loosely inspired by @-expressions in Racket
State of the art toolchain for natural language processing in French
Puristaa (Finnish for compress) - shared prefix compression of ordered string sequences.
Designed to process and combine multiple files within a specified directory into a single output file.
A utility for normalizing persian, arabic and english texts
A PDF-to-text converter based on pdfminer2
A structural approach to signal ML
Kit de ferramentas para processos básicos de Processamento de Linguagem Natural.
Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-the-loop.
A no-strings inference implementation framework Named Entity Recognition (NER) service of wrapped AI models powered by AREkit and the related text-processing pipelines.
A Python library for text processing
Convert HTML to markdown
This module, part of the `abstract_essentials` package, provides a collection of utility functions for working with images and PDFs, including loading and saving images, extracting text from images, capturing screenshots, processing PDFs, and more.
processing web text data for NLP LLM
Utility functions for text processing.
Artless and small template library for server-side rendering.
A library for processing Code Mixed Text. Still in development!
Parse a mistranscribed dictated bible reference into a standard format
MoverScore: Evaluating text generation with contextualized embeddings and earth mover distance
Fast text processing
Clean the text for NLP project
Rotate and combine tables (Danish: Roter og kombiner borde).
A simple text processing toolkit
A tool for performing complex text alignment processes.
Utilities for managing nlp models and for processing text-related data at the Wellcome Trust
Tokenizing and processing text inputs with transformer models
uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.
tpro processes transcripts from speech-to-text services and outputs to various formats.
pdfalign is a very simple tool to grid align extracted pdf text. This is useful for invoice table extraction or further processing with llms / rag systems
A package for working with Kazakh language text processing.
Open-source tool for exploring, labeling, and monitoring data for NLP projects.
A Python package for extracting and processing text from images.
A GPT-J api to use with python3 to generate text, blogs, code, and more (Note: Starting with version 3.0.7 the api is using the old domain again so there might be some issues with limits)
('A tool to convert single or mass PDFs to datasets for', 'language analysis, including a toolbox of text and NLP pre-processing options')
A python tool for Bangla text processing
Utilities for text processing tasks with Deep NLP
Word embeddings with meaningful dimensions for better explainability.
Sketch Grammar Explorer (Sketch Engine API wrapper)
koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
A downloader for textual corpora, for use in digital humanities, corpus linguistics, and natural language processing.
textcleaner: text-data pre-processing library
Easy NLP library for Python
Instruction-Guided Image Captioning
Natural Language Processing Utility Functions
A Python package for converting numbers and floats (up to 15 digits) into Georgian text,
A package for text processing
Python client for Aeca database