Tests | |
---|
Documentation | |
Release | |
Citation | |
Phonemizer -- foʊnmaɪzɚ
-
The phonemizer allows simple phonemization of words and texts in many languages.
-
Provides both the phonemize
command-line tool and the Python function
phonemizer.phonemize
. See the package's documentation.
-
It is based on four backends: espeak, espeak-mbrola, festival and
segments. The backends have different properties and capabilities resumed
in table below. The backend choice is let to the user.
-
espeak-ng is a Text-to-Speech
software supporting a lot of languages and IPA (International Phonetic
Alphabet) output.
-
espeak-ng-mbrola
uses the SAMPA phonetic alphabet instead of IPA but does not preserve word
boundaries.
-
festival is another
Tex-to-Speech engine. Its phonemizer backend currently supports only
American English. It uses a custom phoneset, but it
allows tokenization at the syllable level.
-
segments is a Unicode tokenizer that
build a phonemization from a grapheme to phoneme mapping provided as a file
by the user.
| espeak | espeak-mbrola | festival | segments |
---|
phone set | IPA | SAMPA | custom | user defined |
supported languages | 100+ | 35 | US English | user defined |
processing speed | fast | slow | very slow | fast |
phone tokens | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
syllable tokens | :x: | :x: | :heavy_check_mark: | :x: |
word tokens | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: |
punctuation preservation | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: |
stressed phones | :heavy_check_mark: | :x: | :x: | :x: |
tie | :heavy_check_mark: | :x: | :x: | :x: |
Citation
To refenrece the phonemizer
in your own work, please cite the following JOSS
paper.
@article{Bernard2021,
doi = {10.21105/joss.03958},
url = {https://doi.org/10.21105/joss.03958},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {68},
pages = {3958},
author = {Mathieu Bernard and Hadrien Titeux},
title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},
journal = {Journal of Open Source Software}
}
Licence
Copyright 2015-2021 Mathieu Bernard
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.