New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

github.com/reriiasu/speech-to-text

Package Overview

Dependencies

Alerts

File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/reriiasu/speech-to-text

v0.4.1
Source
Go

Version published: 7 months ago

Created: 2 years ago

Source

speech-to-text

Real-time transcription using faster-whisper

architecture

Accepts audio input from a microphone using a Sounddevice. By using Silero VAD(Voice Activity Detection), silent parts are detected and recognized as one voice data. This audio data is converted to text using Faster-Whisper.

The HTML-based GUI allows you to check the transcription results and make detailed settings for the faster-whisper.

Transcription speed

If the sentences are well separated, the transcription takes less than a second. TranscriptionSpeed

Large-v2 model
Executed with CUDA 11.7 on a NVIDIA GeForce RTX 3060 12GB.

Installation

pip install .

for Windows

Please execute "run.bat." It will perform the following actions:

Create a Python virtual environment.
Install pip packages.
Run speech_to_text.

Usage

python -m speech_to_text
Select "App Settings" and configure the settings.
Select "Model Settings" and configure the settings.
Select "Transcribe Settings" and configure the settings.
Select "VAD Settings" and configure the settings.
Start Transcription

If you use the OpenAI API for text proofreading, set OPENAI_API_KEY as an environment variable.

Notes

If you select local_model in "Model size or path", the model with the same name in the local folder will be referenced.

Demo

demo

News

2023-06-26

Add generate audio files from input sound.
Add synchronize audio files with transcription.
Audio and text highlighting are linked.

2023-06-29

Add transcription from audio files.(only wav format)

2023-07-03

Add Send transcription results from a WebSocket server to a WebSocket client.
Example of use: Display subtitles in live streaming.

2023-07-05

Add generate SRT files from transcription result.

2023-07-08

Add support for mp3, ogg, and other audio files.
Depends on Soundfile support.
Add setting to include non-speech data in buffer.
While this will increase memory usage, it will improve transcription accuracy.

2023-07-09

Add non-speech threshold setting.

2023-07-11

Add Text proofreading option via OpenAI API.
Transcription results can be proofread.

2023-07-12

Add feature where audio and word highlighting are synchronized.
if Word Timestamps is true.

2023-10-01

Support for repetition_penalty and no_repeat_ngram_size in transcribe_settings.
Updating packages.

2023-11-27

Support "large-v3" model.
Update faster-whisper requirement to include the latest version "0.10.0".

2024-07-23

Support "Faster Distil-Whisper" model.
Update faster-whisper requirement to include the latest version "1.0.3".
Updating packages.
Add run.bat for Windows.

Todo

Save and load previous settings.
Use Silero VAD
Allow local parameters to be set from the GUI.
Supports additional options in faster-whisper 0.8.0

FAQs

What is github.com/reriiasu/speech-to-text?

Package last updated on 23 Jul 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/reriiasu/speech-to-text

speech-to-text

Transcription speed

Installation

for Windows

Usage

Notes

Demo

News

2023-06-26

2023-06-29

2023-07-03

2023-07-05

2023-07-08

2023-07-09

2023-07-11

2023-07-12

2023-10-01

2023-11-27

2024-07-23

Todo

Related posts

vlt Launches "reproduce": A New Tool Challenging the Limits of Package Provenance

Malicious PyPI Package Exploits Deezer API for Coordinated Music Piracy