semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models such as BERT.
Toolkit for extracting logical chunks from complex document pages.
Early intent detection using n-gram language models
Amazon Translate Language Plugin
Language detection component (langdetect) for PCU project
Translate text, files, markup and BeautifulSoup.
A simple Language Detection Library
Post-hoc LLM watermarking toolkit
PyLaDe - Language Detection tool written in Python.
A Pygments plugin providing syntax highlighting for YARA-L 2.0, a language used to create detection rules for Google Security Operations (SecOps)
Universal text analysis module for detecting language, meaningfulness, and structure
CLD: language detection heads for ASR models
Fast, accurate language detection using static LLM embeddings
Multi-language code duplicate detector using semantic embeddings and Tree-Sitter parsing
LLM-based Universal Measure Of Salience - Testing LLMs' ability to capture image salience
API for natural language processing.
A comprehensive AI redteaming framework for testing small language models across multiple safety and alignment dimensions, including deceptive alignment detection, evaluation awareness testing, data exfiltration prevention, and meta-cognition analysis with statistical significance testing.
A fast language detection package for Ikinyarwanda(Native language of Rwandans)
Language-agnostic synchronization of subtitles with video via speech detection, and other subtitle tools.
Lightweight language detection (rule-based + ML hybrid)
A module for language statistics. Currently has blob languages detection.
Python bindings around Google Chromium's embedded compact language detection library (CLD2)
Fine-tuned BERT model for language detection.
Detect Languages (WIP)
Implementation of SCALE metric and ScreenEval
dots.ocr: Multilingual Document Layout Parsing in one Vision-Language Model
Language identification via character n-gram profiles. Candidate gating guided by priors and linguistic cues, then probability estimation for each language. Supports 161 languages. Returns a top-1 ISO code or a probability-ordered list.
Get Tweets by IDs through Twitter's API
Multi-language CLI tool for detecting concurrency and thread safety issues
Multi-language CLI tool for detecting concurrency and thread safety issues
This plugin detects language from given string.
a tool to detecting the language for a small piece of unicode text without any dependency to other libraries.
Language-agnostic synchronization of subtitles with video via speech detection.
A lightweight tool detection system for parsing user intents into structured tool calls
This project implements a client for the Project Oxford API's.
고급 텍스트 분석을 위한 Python 패키지
OCR, layout analysis, and line detection in 90+ languages
Retromorphic Testing Framework (RTF) for detecting hallucinations in Large Language Models (LLMs)
A tool to detect subtitle languages in media files
A machine learning-based programming language detection library
A library for detecting profanity from different languages.
A python module which lets you use Google Translate (translation, transliteration, defintion, language detection, etc.) by parsing the website.
dots.ocr: Multilingual Document Layout Parsing in one Vision-Language Model
0.0.6 remove accelerate. dots.ocr: Multilingual Document Layout Parsing in one Vision-Language Model
A lightweight language detection library similar to fasttext and Google's implementation with ONNX support
LexiDecay is a semi-supervised lexical weighting model for unstructured text. It classifies content by adaptive word-frequency decay and soft lexical scoring. Fast (O(n·m)), language-flexible, and training-free — ideal for topic classification, semantic filtering, and intent detection.
Hybrid static + LLM codebase bug detector for multi-language projects.
A language detector for short string or chat.