Research
Recent Trends in Malicious Packages Targeting Discord
The Socket research team breaks down a sampling of malicious packages that download and execute files, among other suspicious behaviors, targeting the popular Discord platform.
pronunciation-dictionary-utils
CLI and library to modify pronunciation dictionaries (any language).
Readme
Library and CLI to modify pronunciation dictionaries (any language).
export-vocabulary
: export vocabulary from dictionariesexport-phonemes
: export phoneme set from dictionariesmerge
: merge dictionaries togetherextract
: extract subset of dictionary vocabularymap-symbols-in-pronunciations
: map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPAmap-symbols-in-pronunciations-json
: map phonemes/symbols in pronunciations to phoneme/symbol specified in fileremove-symbols-from-vocabulary
: remove phonemes/symbols from vocabularyremove-symbols-from-pronunciations
: remove phonemes/symbols from pronunciationsremove-symbols-from-words
: remove characters/symbols from wordschange-formatting
: change formatting of dictionariesselect-single-pronunciation
: select single pronunciationchange-word-casing
: transform all words to upper- or lower-casesort-words
: sort dictionary after wordssort-pronunciations
: sort dictionary pronunciationsnormalize-weights
: normalize pronunciation weights for each wordpip install pronunciation-dictionary-utils --user
usage: dict-cli [-h] [-v]
{export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
...
This program provides methods to modify pronunciation dictionaries.
positional arguments:
{export-vocabulary,export-phonemes,merge,extract,map-symbols-in-pronunciations,map-symbols-in-pronunciations-json,remove-symbols-from-vocabulary,remove-symbols-from-pronunciations,remove-symbols-from-words,change-formatting,select-single-pronunciation,change-word-casing,sort-words,sort-pronunciations,normalize-weights}
description
export-vocabulary export vocabulary from dictionaries
export-phonemes export phoneme set from dictionaries
merge merge dictionaries together
extract extract subset of dictionary vocabulary
map-symbols-in-pronunciations map phonemes/symbols in pronunciations to another phoneme/symbol, e.g., mapping ARPAbet to IPA
map-symbols-in-pronunciations-json map phonemes/symbols in pronunciations to phoneme/symbol specified in file
remove-symbols-from-vocabulary remove phonemes/symbols from vocabulary
remove-symbols-from-pronunciations remove phonemes/symbols from pronunciations
remove-symbols-from-words remove characters/symbols from words
change-formatting change formatting of dictionaries
select-single-pronunciation select single pronunciation
change-word-casing transform all words to upper- or lower-case
sort-words sort dictionary after words
sort-pronunciations sort dictionary pronunciations
normalize-weights normalize pronunciation weights for each word
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
# Download CMU dictionary
wget https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict \
-O "/tmp/example.dict"
# Change formatting to remove numbers from words, comments and save as UTF-8
dict-cli change-formatting \
"/tmp/example.dict" \
--deserialization-encoding "ISO-8859-1" \
--consider-numbers \
--consider-pronunciation-comments \
--serialization-encoding "UTF-8"
# Export phoneme set
dict-cli export-phonemes \
"/tmp/example.dict" \
"/tmp/example-phoneme-set.txt"
# Export vocabulary
dict-cli export-vocabulary \
"/tmp/example.dict" \
"/tmp/example-vocabulary.txt"
# Keep first pronunciation for each word and discard the rest
dict-cli select-single-pronunciation \
"/tmp/example.dict" \
--mode "first"
# Replace all "ER0" phonemes with "ER"
dict-cli map-symbols-in-pronunciations \
"/tmp/example.dict" \
"ER0" "ER"
# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv \
python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary-utils.git
cd pronunciation-dictionary-utils
# create virtual environment
python3.8 -m pipenv install --dev
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary-utils
# activate environment
python3.8 -m pipenv shell
# run tests
tox
Final lines of test result output:
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
Taubert, S., and Przybysz, N. (2024). pronunciation-dictionary-utils (Version 0.0.5) [Computer software]. https://doi.org/10.5281/zenodo.10560153
FAQs
CLI and library to modify pronunciation dictionaries (any language).
We found that pronunciation-dictionary-utils demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
The Socket research team breaks down a sampling of malicious packages that download and execute files, among other suspicious behaviors, targeting the popular Discord platform.
Security News
Socket CEO Feross Aboukhadijeh joins a16z partners to discuss how modern, sophisticated supply chain attacks require AI-driven defenses and explore the challenges and solutions in leveraging AI for threat detection early in the development life cycle.
Security News
NIST's new AI Risk Management Framework aims to enhance the security and reliability of generative AI systems and address the unique challenges of malicious AI exploits.