Security News
Opengrep Emerges as Open Source Alternative Amid Semgrep Licensing Controversy
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
nemo-text-processing
Advanced tools
nemo-text-processing
is a Python package for text normalization and inverse text normalization.
NeMo-text-processing (text normalization and inverse text normalization).
Google Collab Notebook | Description |
---|---|
Text_(Inverse)_Normalization.ipynb | Quick-start guide |
WFST_Tutorial | In-depth tutorial on grammar customization |
If you have a question which is not answered in the Github discussions, encounter a bug or have a feature request, please create a Github issue. We also welcome you to directly open a pull request to fix a bug or add a feature.
We recommend setting up a fresh Conda environment to install NeMo-text-processing.
conda create --name nemo_tn python==3.10
conda activate nemo_tn
(Optional) To use hybrid text normalization install PyTorch using their configurator.
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
NOTE: The command used to install PyTorch may depend on your system.
Use this installation mode if you want the latest released version.
pip install nemo_text_processing
NOTE: This should work on any Linux OS with x86_64. Pip installation on MacOS and Windows are not supported due to the dependency Pynini. On a platform other than Linux x86_64, installing from Pip tries to compile Pynini from scratch, and requires OpenFst headers and libraries to be in the expected place. So if it's working for you, it's because you happen to have installed OpenFst in the right way in the right place. So if you want to Pip install Pynini on MacOS, you have to have pre-compiled and pre-installed OpenFst. The Pynini README for that version should tell you which version it needs and what --enable-foo
flags to use.
Instead, we recommend you to use conda-forge to install Pynini on MacOS or Windows:
conda install -c conda-forge pynini=2.1.6.post1
.
Use this installation mode if you want the a version from particular GitHub branch (e.g main).
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo-text-processing.git@{BRANCH}#egg=nemo_text_processing
Use this installation mode if you are contributing to NeMo-text-processing.
git clone https://github.com/NVIDIA/NeMo-text-processing
cd NeMo-text-processing
./reinstall.sh
NOTE: If you only want the toolkit without additional conda-based dependencies, you may replace reinstall.sh
with pip install -e .
with the NeMo-text-processing root directory as your current working director.
We welcome community contributions! Please refer to the CONTRIBUTING.md for guidelines.
@inproceedings{zhang21ja_interspeech,
author={Yang Zhang and Evelina Bakhturina and Boris Ginsburg},
title={{NeMo (Inverse) Text Normalization: From Development to Production}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={4857--4859}
}
@inproceedings{bakhturina22_interspeech,
author={Evelina Bakhturina and Yang Zhang and Boris Ginsburg},
title={{Shallow Fusion of Weighted Finite-State Transducer and Language Model for
Text Normalization}},
year=2022,
booktitle={Proc. Interspeech 2022}
}
NeMo-text-processing is under Apache 2.0 license.
FAQs
NeMo text processing for ASR and TTS
We found that nemo-text-processing demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Security News
cURL and Go security teams are publicly rejecting CVSS as flawed for assessing vulnerabilities and are calling for more accurate, context-aware approaches.