
Product
Announcing Socket Fix 2.0
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
sign-language-translator
Advanced tools
Build custom Translators and Translate between text and sign language videos with AI.
Build Custom Translators and Translate between Sign Language & Text with AI.
Support Us ❤️ |
---|
🛠️
Sign language consists of gestures and expressions used mainly by the hearing-impaired to talk. This project is an effort to bridge the communication gap between the hearing and the hearing-impaired community using Artificial Intelligence.
This package comes with an extensible rule-based text-to-sign translation system that can be used to generate training data for Deep Learning models for both sign to text & text to sign translation.
[!Tip] To create a rule-based translation system for your regional language, you can inherit the TextLanguage and SignLanguage classes and pass them as arguments to the ConcatenativeSynthesis class. To write sample texts of supported words, you can use our language models. Then, you can use that system to fine-tune our deep learning models.
See the documentation and our datasets for details.
slt.models.video_embedding
sub-package and the $ slt embed
command.slt.models.ConcatenativeSynthesis
)slt.models.sign_to_text
or the encoder of any multilingual seq2seq model, on your dataset.There are two approaches to this problem:
Rule Based Concatenation
slt.languages.TextLanguage
(see slt.languages.text
sub-package for details)slt.languages.SignLanguage
(see slt.languages.sign
sub-package for details)slt.models.ConcatenativeSynthesis
for translation.Deep learning (seq2seq)
shape = (time, num_landmarks * num_coordinates)
)
For our datasets & conventions, see the sign-language-datasets repo and its releases. See this documentation for more on building a dataset of Sign Language videos (or motion capture gloves' output features).
Your data should include:
Try to incorporate:
Try to capture variations in signs in a scalable and diversity accommodating way and enable advancing sign language standardization efforts.
pip install sign-language-translator
git clone
):The package ships with some optional dependencies as well (e.g. deep_translator for synonym finding and mediapipe for a pretrained pose extraction model). Install them by appending [all]
, [full]
, [mediapipe]
or [synonyms]
to the project name in the command (e.g pip install sign-langauge-translator[full]
).
git clone https://github.com/sign-language-translator/sign-language-translator.git
cd sign-language-translator
pip install -e ".[all]"
pip install -e git+https://github.com/sign-language-translator/sign-language-translator.git#egg=sign_language_translator
Head over to slt.readthedocs.io to see the detailed usage in Python, CLI and gradio GUI. See the test cases or the notebooks repo to see the internal code in action.
Individual models deployed on HuggingFace Spaces:
import sign_language_translator as slt
# The core model of the project (rule-based text-to-sign translator)
# which enables us to generate synthetic training datasets
model = slt.models.ConcatenativeSynthesis(
text_language="urdu", sign_language="pk-sl", sign_format="video" )
text = "یہ بہت اچھا ہے۔" # "this-very-good-is"
sign = model.translate(text) # tokenize, map, download & concatenate
sign.show()
model.sign_format = slt.SignFormatCodes.LANDMARKS
model.sign_embedding_model = "mediapipe-world"
# ==== English ==== #
model.text_language = slt.languages.text.English()
sign_2 = model.translate("This is an apple.")
sign_2.save("this-is-an-apple.csv", overwrite=True)
# ==== Hindi ==== #
model.text_language = slt.TextLanguageCodes.HINDI
sign_3 = model.translate("कैसे हैं आप?") # "how-are-you"
sign_3.save_animation("how-are-you.gif", overwrite=True)
"یہ بہت اچھا ہے۔" (this-very-good-is) | "कैसे हैं आप?" (how-are-you) |
import sign_language_translator as slt
# sign = slt.Video("path/to/video.mp4")
sign = slt.Video.load_asset("pk-hfad-1_aap-ka-nam-kya(what)-hy") # your name what is? (auto-downloaded)
sign.show_frames_grid()
# Extract Pose Vector for feature reduction
embedding_model = slt.models.MediaPipeLandmarksModel() # pip install "sign_language_translator[mediapipe]" # (or [all])
embedding = embedding_model.embed(sign.iter_frames())
slt.Landmarks(embedding.reshape((-1, 75, 5)),
connections="mediapipe-world" ).show()
# # Load sign-to-text model (pytorch) (COMING SOON!)
# translation_model = slt.get_model(slt.ModelCodes.Gesture)
# text = translation_model.translate(embedding)
# print(text)
# custom translator (https://slt.readthedocs.io/en/latest/#building-custom-translators)
help(slt.languages.SignLanguage)
help(slt.languages.text.Urdu)
help(slt.models.ConcatenativeSynthesis)
$ slt
Usage: slt [OPTIONS] COMMAND [ARGS]...
Sign Language Translator (SLT) command line interface.
Documentation: https://sign-language-translator.readthedocs.io
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
assets Assets manager to download & display Datasets & Models.
complete Complete a sequence using Language Models.
embed Embed Videos Using Selected Model.
translate Translate text into sign language or vice versa.
Generate training examples: write a sentence with a language model and synthesize a sign language video from it with a single command:
slt translate --model-code rule-based --text-lang urdu --sign-lang pk-sl --sign-format video \
"$(slt complete '<' --model-code urdu-mixed-ngram --join '')"
Available Functions:
Name | Vocabulary | Ambiguous tokens | Signs |
---|---|---|---|
English | 1591 words+phrases | 167 | 776 |
Urdu | 2080 words+phrases | 227 | 776 |
Hindi | 137 words+phrases | 5 | 84 |
Available Functions:
Name | Vocabulary | Dataset | Parallel Corpus |
---|---|---|---|
Pakistan Sign Language | 776 | 23 hours | details |
Name | Architecture | Description | Input | Output | Web Demo |
---|---|---|---|---|---|
Concatenative Synthesis | Rules + Hash Tables | The Core Rule-Based translator mainly used to synthesize translation dataset. Initialize it using TextLanguage, SignLanguage & SignFormat objects. | string | slt.Video | slt.Landmarks |
Name | Architecture | Description | Input format | Output format |
---|---|---|---|---|
MediaPipe Landmarks (Pose + Hands) | CNN based pipelines. See Here: Pose, Hands | Encodes videos into pose vectors (3D world or 2D image) depicting the movements of the performer. | List of numpy images (n_frames, height, width, channels) | torch.Tensor (n_frames, n_landmarks * 5) |
Name | Architecture | Description | Input format | Output format |
---|---|---|---|---|
N-Gram Langauge Model | Hash Tables | Predicts the next token based on learned statistics about previous N tokens. | List of tokens | (token, probability) |
Transformer Language Model | Decoder-only Transformers (GPT) | Predicts next token using query-key-value attention, linear transformations and soft probabilities. | torch.Tensor (batch, token_ids) List of tokens | torch.Tensor (batch, token_ids, vocab_size) (token, probability) |
Name | Architecture | Description | Input format | Output format |
---|---|---|---|---|
Vector Lookup | HashTable | Finds token index and returns the coresponding vector. Tokenizes sentences and computes average vector of known tokens. | string | torch.Tensor (n_dim,) |
To create your own sign language translator, you'll need these essential components:
slt.languages.TextLanguage
:
slt.languages.SignLanguage
:
slt.models.ConcatenativeSynthesis
class to obtain a rule-based translator object.Remember to contribute back to the community:
See the code
at Build Custom Translator section in ReadTheDocs or in this notebook.
sign-language-translator
(Click to see file descriptions)├── README.md ├── pyproject.toml ├── requirements.txt ├── docs │ └── * ├── tests │ └── * │ └── sign_language_translator ├── cli.py `> slt` command line interface ├── assets (auto-downloaded) │ └── * │ ├── config │ ├── assets.py download, extract and remove models & datasets │ ├── colors.py named RGB tuples for visualization │ ├── enums.py string short codes to identify models & classes │ ├── settings.py global variables in repository design-pattern │ ├── urls.json │ └── utils.py │ ├── languages │ ├── utils.py │ ├── vocab.py reads word mapping datasets │ ├── sign │ │ ├── mapping_rules.py strategy design-pattern for word to sign mapping │ │ ├── pakistan_sign_language.py │ │ └── sign_language.py Base class for text to sign mapping and sentence restructuring │ │ │ └── text │ ├── english.py │ ├── hindi.py │ ├── text_language.py Base class for text normalization, tokenization & tagging │ └── urdu.py │ ├── models │ ├── _utils.py │ ├── utils.py │ ├── language_models │ │ ├── abstract_language_model.py │ │ ├── beam_sampling.py │ │ ├── mixer.py wrap multiple language models into a single object │ │ ├── ngram_language_model.py uses hash-tables & frequency to predict next token │ │ └── transformer_language_model │ │ ├── layers.py │ │ ├── model.py decoder-only transformer with controllable vocabulary │ │ └── train.py │ │ │ ├── sign_to_text │ ├── text_to_sign │ │ ├── concatenative_synthesis.py join sign clip of each word in text using rules │ │ └── t2s_model.py Base class │ │ │ ├── text_embedding │ │ ├── text_embedding_model.py Base class │ │ └── vector_lookup_model.py retrieves word embedding from a vector database │ │ │ └── video_embedding │ ├── mediapipe_landmarks_model.py 2D & 3D coordinates of points on body │ └── video_embedding_model.py Base class │ ├── text │ ├── metrics.py numeric score techniques │ ├── preprocess.py │ ├── subtitles.py WebVTT │ ├── synonyms.py │ ├── tagger.py classify tokens to assist in mapping │ ├── tokenizer.py break text into words, phrases, sentences etc │ └── utils.py │ ├── utils │ ├── archive.py zip datasets │ ├── arrays.py common interface & operations for numpy.ndarray and torch.Tensor │ ├── download.py │ ├── parallel.py multi-threading │ ├── tree.py print file hierarchy │ └── utils.py │ └── vision ├── _utils.py ├── utils.py ├── landmarks │ ├── connections.py drawing configurations for different landmarks models │ ├── display.py visualize points & lines on 3D plot │ └── landmarks.py wrapper for sequence of collection of points on body │ ├── sign │ └── sign.py Base class to wrap around sign clips │ └── video ├── display.py jupyter notebooks inline video & pop-up in CLI ├── transformations.py strategy design-pattern for image augmentation ├── video_iterators.py adapter design-pattern for video reading └── video.py
See our datasets & conventions here.
string short codes
of your classes and models into enums.py
, and ensure to update factory functions like get_model()
and get_.*_language()
.# ToDo
from the code or fix # bug
/ # type: ignore
or anything from the roadmap.# 0.8.2: landmark augmentation (zoom, rotate, move, noise, duration, rectify, stabilize, __repr__)
# 0.8.3: trim signs before concatenation, insert transition frames
# 0.8.4: plotly & three.js/mixamo display , pass matplotlib kwargs all the way down
# 0.8.5: subtitles/captions
# 0.8.6: stabilize video batch using landmarks, draw/overlay 2D landmarks on video/image
# mock test cases which require internet when internet isn't available / test for dummy languages
# improve langauge classes architecture (for easy customization via inheritance) | clean-up slt.languages.text.* code
# ? add a generic SignedTextLanguage class which just maps text lang to signs based on mappinng.json ?
# add progress bar to slt.models.MediaPipeLandmarksModel
# rename 'country' to 'region' & rename wordless_wordless to wordless.mp4 # insert video type to archives: .*.videos-`(dictionary|sentences)(-replication)?`-mp4.zip
# decide mediapipe-all = world & image concactenated in landmark dim or feature dim?
# expand dictionary video data by scraping everything
# upload the 12 person dictionary replication landmark dataset
# 0.9.1: TransformerLanguageModel - Drop space tokens & bidirectional prediction. infer on specific vocab only .... pretrain on max vocab and mixed data. finetune on balanced data (wiki==news==novels==poetry==reviews) .... then RLHF on coherent generations (Comparison data: generate 100 examples (at high temperature) and cut them at random points and regerate the rest and label these pairs for coherence[ and novelity].) (use same model/BERT as reward model with regression head.) (ranking loss with margin) (each token is a time step) (min KL Divergance from base - exploration without mode collapse) ... label disambiguation data and freeze all and finetune disambiguated_tokens_embeddings (disambiguated embedding: word ± 0.1*(sense1 - sense2).normalize()) .... generate data on broken compound words and finetune their token_embeddings ... generate sentences of supported words and translate to other languages.
# 0.9.2: sign to text with custom seq2seq transformer
# 0.9.3: pose vector generation from text with custom seq2seq transformer
# 0.9.4: sign to text with fine-tuned whisper
# 0.9.5: pose vector generation with fine-tuned mBERT
# 0.9.6: custom 3DLandmark model (training data = mediapipe's output on activity recognition or any dataset)
# 1.0.0: all models trained on custom landmark model
# 🎉
# 1.0.1: video to text model (connect custom landmark model with sign2text model and finetune)
# 1.1.0: motion transfer
# 1.1.1: custom pose2video: stable diffusion or GAN?
# 1.2.0: speech to sign
# 1.2.1: sign to speech
Issues
# bugfix: inaccurate num_frames in video file metadata
# bugfix: Expression of type "Literal[False]" cannot be assigned to member "SHOW_DOWNLOAD_PROGRESS" of class "Settings"
# feature: video transformations (e.g. stabilization with image pose landmarks, watermark text/logo)
# improvement: SignFilename.parse("videos/pk-hfad-1_airplane.mp4").gloss # airplane
Miscellaneous
# parallel text corpus
# clean demonstration notebooks
# * host video dataset online, descriptive filenames
# dataset info table
# sequence diagram for creating a translator
# GUI with gradio or something
Research Papers
# datasets: clips, text, sentences, disambiguation
# rule based translation: describe entire repo
# deep sign-to-text: pipeline + experiments
# deep text-to-sign: pipeline + experiments
Servers / Product
# ML inference server
# Django backend server
# React Native mobile app
@software{mdsr2023slt,
author = {Mudassar Iqbal},
title = {Sign Language Translator: Python Library and AI Framework},
year = {2023},
publisher = {GitHub},
howpublished = {\url{https://github.com/sign-language-translator/sign-language-translator}},
}
This project is licensed under the Apache 2.0 License. You are permitted to use the library, create modified versions, or incorporate pieces of the code into your own work. Your product or research, whether commercial or non-commercial, must provide appropriate credit to the original author(s) by citing this repository.
Stay Tuned for research Papers!
This project started in October 2021 as a BS Computer Science final year project with 3 students and 1 supervisor. After 9 months at university, it became a hobby project for Mudassar who has continued it till at least 2024-09-23.
Count total number of lines of code (Package: 14,034 + Tests: 2,928):
git ls-files | grep '\.py' | xargs wc -l
Just for fun 🙃
Q: What was the deaf student's favorite course?
A: Communication skills
Q: Why was the ML engineer sad?
A: Triplet loss
FAQs
Build custom Translators and Translate between text and sign language videos with AI.
We found that sign-language-translator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Security News
Socket CEO Feross Aboukhadijeh joins Risky Business Weekly to unpack recent npm phishing attacks, their limited impact, and the risks if attackers get smarter.
Product
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.