Security News
Maven Central Adds Sigstore Signature Validation
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
at16k is a Python library to perform automatic speech recognition or speech to text conversion.
Pronounced as at sixteen k.
Try out the interactive demo here.
at16k is a Python library to perform automatic speech recognition or speech to text conversion. The goal of this project is to provide the community with a production quality speech-to-text library.
It is recommended that you install at16k in a virtual environment.
$ pip install at16k
Requires: poetry
$ git clone https://github.com/at16k/at16k.git
$ poetry env use python3.6
$ poetry install
Currently, three models are available for speech to text conversion.
To download all the models:
$ python -m at16k.download all
Alternatively, you can download only the model you need. For example:
$ python -m at16k.download en_8k
$ python -m at16k.download en_16k
$ python -m at16k.download en_16k_rnnt
By default, the models will be downloaded and stored at <HOME_DIR>/.at16k. To override the default, set the environment variable AT16K_RESOURCES_DIR. For example:
$ export AT16K_RESOURCES_DIR=/path/to/my/directory
You will need to reuse this environment variable while using the API via command-line, library or REST API.
at16k accepts wav files with the following specs:
Use ffmpeg to convert your audio/video files to an acceptable format. For example,
# For 8 KHz
$ ffmpeg -i <input_file> -ar 8000 -ac 1 -ab 16 <output_file>
# For 16 KHz
$ ffmpeg -i <input_file> -ar 16000 -ac 1 -ab 16 <output_file>
at16k supports two modes for performing ASR - offline and real-time. And, it comes with a handy command line utility to quickly try out different models and use cases.
Here are a few examples -
# Offline ASR, 8 KHz sampling rate
$ at16k-convert -i <path_to_wav_file> -m en_8k
# Offline ASR, 16 KHz sampling rate
$ at16k-convert -i <path_to_wav_file> -m en_16k
# Real-time ASR, 16 KHz sampling rate, from a file, beam decoding
$ at16k-convert -i <path_to_wav_file> -m en_16k_rnnt -d beam
# Real-time ASR, 16 KHz sampling rate, from mic input, greedy decoding (requires pyaudio)
$ at16k-convert -m en_16k_rnnt -d greedy
If the at16k-convert binary is not available for some reason, replace it with -
python -m at16k.bin.speech_to_text ...
Check this file for examples on how to use at16k as a library.
The max duration of your audio file should be less than 30 seconds when using en_8k, and less than 15 seconds when using en_16k. An error will not be thrown if the duration exceeds the limits, however, your transcript may contain errors and missing text.
This software is distributed under the MIT license.
We would like to thank Google TensorFlow Research Cloud (TFRC) program for providing access to cloud TPUs.
FAQs
at16k is a Python library to perform automatic speech recognition or speech to text conversion.
We found that at16k demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.