New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

usttc

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

usttc

Unified Speech-to-text Client

0.0.8

PyPI

Maintainers: 1

USTTC (Unified Speech-to-Text Client)

This project provides a simple and unified client wrapper for multiple Speech-to-test (STT) providers on the basic use cases, and gives users an easy way to switch and test among different providers.

Background

The accuracy of Speech-to-text (STT) improved significantly during the past few years. There are a lot of cloud STT providers on the market, including some big players like Google and AWS, and a few ambitious new providers like Voicegain.ai and Assembly.ai.

As a Speech Recognition Scientist, I have reviewed many providers in the last few years, and I have noticed that each provider has its own unique features. However, the majority of users do not necessarily need those additional features, especially in the early testing stage. Their requirements are very simple and basic -- getting an accurate transcript of the provided audio.

Regarding my personal background, I am an Senior AI Scientist at Voicegain (specializing in Speech Recognition), but this repository, USTTC, is a personal project, and I intend to work on it without any bias. As mentioned, the goal of this project is to enable more people in the community to explore and test STT without too much trouble dealing with varied providers, APIs, and documentation.

Installation

Please ensure that you have ffmpeg installed before install USTTC.

You can install the module using the Python Package Index using the command below.

pip install usttc

Determine which STT providers to test

Currently, USTTC supports the following 6 STT providers. We are going to include a few more providers later on.

These six providers are included because they all have comparable accuracy, reasonable complete features, and simple-to-use client SDKs. Now you need to decide which providers you want to test. This is truly an overwhelming task, because there is no single right answer. Each provider has unique strengths and weaknesses, as well as its own unique pricing strategy. If you don't know which one is best for your application, we suggest you test all of them on your own audio samples to get a sense. Fortunately, USTTC makes it very easy to test multiple providers using (almost) the same code, which is also the original intention of USTTC.

The following table shows the price of each provider, so that you can also choose based on your budget.

Provider Price Details^[1]	$ per minute^[2]	Free Tier per month	Free Credits	Minimum per request charge^[3]	Increments
Google STT	$0.0360	60 minutes	8,333 minutes ($300)^[4]	15 seconds	15 seconds
AWS Transcribe	$0.0240	60 minutes^[5]	No	15 seconds	1 second
Voicegain.ai	$0.0095	No	5,263 minutes ($50)	1 second	1 second
Rev.ai	$0.0350	No	300 minutes	15 seconds	15 seconds
Assembly.ai	$0.0150	180 minutes	No	1 second	1 second
Deepgram	$0.0125	No	12,000 minutes ($150)	Not clear	Not clear

[1]: The price may change. Please check the pricing page for each provider
[2]: This is the pay-as-you-go price. All providers provide discounts for high volumes
[3]: You need to consider this if the average audio duration is shorter than 15s in your application
[4]: The Google Cloud Free credits are distributed across all cloud services and are only valid for the first 90 days
[5]: The AWS Free Tier is only available for the first 12 months

Create account on selected STT providers

Once you decide which providers to test, you can create an account with them by following the steps below.

Google STT

Sign up Google Cloud Platform. https://console.cloud.google.com/getting-started
Enable Google Cloud Speech API. https://cloud.google.com/endpoints/docs/openapi/enable-api
Create a storage bucket. You can use the default setting. https://cloud.google.com/storage/docs/creating-buckets
Create a service account. Add Cloud Speech Client and Storage Object Admin two roles. https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console
Create new JSON key for the service account you created. https://cloud.google.com/iam/docs/creating-managing-service-account-keys

from usttc import AsrClientFactory, AsrProvider

asr_client = AsrClientFactory.get_client_from_key_file(
    asr_provider=AsrProvider.GOOGLE,
    filename="<YOUR_GOOGLE_CLOUD_JSON_KEY_FILE_PATH>",
    google_storage_bucket="<YOUR_GOOGLE_STORAGE_BUCKET_NAME>"
)

AWS Transcribe

Sign up for AWS. https://portal.aws.amazon.com/billing/signup#/start
Create a S3 bucket. You can use the default setting. Please take a note of the region of your S3 bucket. https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html
Create a User Group. Attach AmazonS3FullAccess and AmazonTranscribeFullAccess permission to the group. https://docs.aws.amazon.com/IAM/latest/UserGuide/id_groups_create.html
Add a User to the created User Group. Get user's access key ID and secret access key. https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console

from usttc import AsrClientFactory, AsrProvider

asr_client = AsrClientFactory.get_client_from_key(
    asr_provider=AsrProvider.AMAZON_AWS,
    key="<YOUR_AWS_USER_ACCESS_KEY_ID>",
    aws_secret_access_key="<YOUR_AWS_USER_SECRET_ACCESS_KEY>",
    region_name='<YOUR_S3_BUCKET_REGION>',
    s3_bucket='<YOUR_S3_BUCKET_NAME>'
)

Voicegain.ai

Sign up. https://console.voicegain.ai/signup
Generate JWT Token. https://support.voicegain.ai/hc/en-us/articles/360028023691-JWT-Authentication

from usttc import AsrClientFactory, AsrProvider

asr_client = AsrClientFactory.get_client_from_key(
    asr_provider=AsrProvider.VOICEGAIN,
    key="<YOUR_VOICEGAIN_JWT_TOKEN>"
)

Rev.ai

Sign up. https://www.rev.ai/auth/signup
Generate Access Token. https://www.rev.ai/access_token

from usttc import AsrClientFactory, AsrProvider

asr_client = AsrClientFactory.get_client_from_key(
    asr_provider=AsrProvider.REV,
    key="<YOUR_REV_ACCESS_TOKEN>"
)

Assembly.ai

Sign up. https://app.assemblyai.com/signup
Get API Key on your account page. https://app.assemblyai.com/account

from usttc import AsrClientFactory, AsrProvider

asr_client = AsrClientFactory.get_client_from_key(
    asr_provider=AsrProvider.ASSEMBLY_AI,
    key="<YOUR_ASSEMBLY_AI_API_KEY>"
)

Deepgram

Sign up. https://console.deepgram.com/signup
Create API Key from the dashboard

from usttc import AsrClientFactory, AsrProvider

asr_client = AsrClientFactory.get_client_from_key(
    asr_provider=AsrProvider.DEEPGRAM,
    key="<YOUR_DEEPGRAM_API_KEY>"
)

Usage

Both pre-recorded audio files and real-time audio streams can be transcribed with USTTC.

Transcribe Pre-Recorded Audio

Using USTTC, it's super easy to transcribe your audio file in (almost) any format. Here is an end-to-end example of an .wav audio as the input.

from usttc.audio import AudioFile

audio = AudioFile(file_path="<YOUR_AUDIO_FILE_PATH>")
result = asr_client.recognize(audio)
print(result.transcript)

Multiple Speakers

An audio file can contain multiple speakers in two ways.

Multi-channel audio: Each channel has one speaker. (We only support stereo audio). In this case, you need to configure separate_speaker_per_channel (example)
Mono audio: All speakers are mixed on the same channel. In this case, you need to configure diarization (example)

Please note here:

If your audio is stereo but both channels have the same content, you should NOT configure separate_speaker_per_channel.
DO NOT use diarization if speakers are already separated by channel.

Compare transcription results

To compare the results from multiple recognizers and know which one is more accurate for the application, I'll normally start by reviewing a few results and getting a sense of the weaknesses and strengths of each recognizer. Sometimes, after I see a few examples, I can easily tell for a specific project which recognizers work and which do not.

If you want to compare the results in a more scientific manner, you can prepare the gold standard reference and calculate Word Error Rate (WER) of the results from each STT provider. However, calculating WER is not trivial, because we don't want to penalize a recognizer if the difference (its result vs. the gold reference) is just the punctuation and capitalization. Moreover, for a digit, it's both acceptable no matter whether using digit-format or spelled-out format.

transcribe-compare package

Voicegain.ai provides a python package called transcribe-compare to help you calculate WER (and do more than that). It solves many issues when calculating WER, including punctuation, capitalization, and digits mentioned above. You can install the module using the Python Package Index using the command below.

 pip install transcribe-compare

We provide a simple example of using USTTC and transcribe-compare together. You can also check their GitHub page for more examples of advanced use cases.

Ensemble

[This feature will be available soon]

After you compare the results from multiple recognizers, you might realize that none of them is perfect (it is a cold and brute reality). Different STT providers might make mistakes in different places. If your budget allows, you can run multiple recognizers at the same time and get higher accuracy by ensembling their results. This feature is on our roadmap.

Transcribe Audio Stream

[This feature will be available soon]

In some applications (e.g. real-time), it's important to stream the audio to the recognizer and get the result simultaneously. All the STT providers that USTTC selected have the streaming feature. The streaming wrapper will be available soon.

FAQs

What is usttc?

Is usttc well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

usttc

USTTC (Unified Speech-to-Text Client)

Background

Installation

Determine which STT providers to test

Create account on selected STT providers

Google STT

AWS Transcribe

Voicegain.ai

Rev.ai

Assembly.ai

Deepgram

Usage

Transcribe Pre-Recorded Audio

Multiple Speakers

Compare transcription results

transcribe-compare package

Ensemble

Transcribe Audio Stream

Related posts

TypeScript is Porting Its Compiler to Go for 10x Faster Builds

Lazarus Strikes npm Again with New Wave of Malicious Packages