Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.

quantulum3

Extract quantities from unstructured text.

information extraction

quantities

units

measurements

nlp

natural language processing

gliner

Generalist model for NER (Extract any entity types from texts)

named-entity-recognition

ner

data-science

natural-language-processing

artificial-intelligence

nlp

zhon

Zhon provides constants used in Chinese text processing.

sumy

Module for automatic summarization of text documents and HTML pages.

data mining

automatic summarization

data reduction

web-data extraction

NLP

natural language processing

demoji

Accurately remove and replace emojis in text strings

emoji

emojis

nlp

natural langauge processing

unicode

nemo-text-processing

NeMo text processing for ASR and TTS

snownlp

Python library for processing Chinese text

chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense chunking library

chunking

rag

retrieval-augmented-generation

nlp

natural-language-processing

text-processing

summa

A text summarization and keyword extraction package based on TextRank

natural language processing

automatic summarization

nostril-detector

Nonsense String Evaluator

program-analysis text-processing gibberish-detection identifiers

textacy

NLP, before and after spaCy

razdel

Splits russian text into tokens, sentences, section. Rule-based

nlp

natural language processing

reliq

Python ctypes bindings for reliq

indic-nlp-library

The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.

uroman

uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.

NLP

computational linguistics

machine translation

natural language processing

romanization

string similarity

realtimestt

A fast Voice Activity Detection and Transcription System

voice-activity-detection

VAD

toolwrapper

A base class for wrapping text-processing tools

subprocess text tool wrapper

html-to-markdown

Convert HTML to markdown

urduhack

Natural Language Processing (NLP) library for Urdu language.

urdu machine learning text pre-processing tensorflow nlp

zalgolib

A Python library for a _FULL_ Zalgo experience

mosestokenizer

Wrappers for several pre-processing scripts from the Moses toolkit.

text tokenization pre-processing

addheader

A command to manage a header section for a source code tree

software engineering

text processing

utilities

dragonmapper

Identification and conversion functions for Chinese text processing

stream2sentence

Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.

nlup

('Core libraries for natural language processing',)

nlp

natural language processing

text

text processing

artificial intelligence

abbreviation-extractor

A library for extracting abbreviations from text.

pystempel

Polish stemmer.

NLP

natural language processing

computational linguistics

stemming

linguistics

language

analiticcl

Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation

preprocessing

pre-processing package for text strings

text pre-processing

thongna

Blazing-fast Thai text processing library powered by Rust

onnxtr

Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.

stam

STAM is a library for dealing with standoff annotations on text, this is the python binding.

adapt-parser

A text-to-intent parsing framework.

natural language processing

aspose-tasks

Aspose.Tasks for Python via .NET is a native library that enables the developers to add MS-Project files processing capabilities to their applications