CMUdict: Python wrapper for cmudict
CMUdict is a versioned python wrapper package for
The CMU Pronouncing Dictionary data
files. The main purpose is to expose the data with little or no assumption on
how it is to be used.
Installation
cmudict
is available on PyPI. Simply install it with pip
:
pip install cmudict
Usage
The cmudict data set includes 4 data files: cmudict.dict, cmudict.phones,
cmudict.symbols, and cmudict.vp. See
The CMU Pronouncing Dictionary for
details on the data. Chances are, if you're here, you already know what's in the
files.
Each file can be accessed through three functions, one which returns the raw
(string) contents, one which returns a binary stream of the file, and one which
does minimal processing of the file into an appropriate structure:
>>> import cmudict
>>> cmudict.dict()
>>> cmudict.dict_string()
>>> cmudict.dict_stream()
>>> cmudict.phones()
>>> cmudict.phones_string()
>>> cmudict.phones_stream()
>>> cmudict.symbols()
>>> cmudict.symbols_string()
>>> cmudict.symbols_stream()
>>> cmudict.vp()
>>> cmudict.vp_string()
>>> cmudict.vp_stream()
Three additional functions are included to maintain compatibility with NLTK:
cmudict.entries(), cmudict.raw(), and cmudict.words(). See the
nltk.corpus.reader.cmudict
documentation for details:
>>> cmudict.entries()
>>> cmudict.raw()
>>> cmudict.words()
And finally, the license for the cmudict data set is available as well:
>>> cmudict.license_string()
Credits
Built on or modeled after the following open source projects: