Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
insanely-fast-whisper
Advanced tools
An opinionated CLI to transcribe Audio files w/ Whisper on-device! Powered by 🤗 Transformers, Optimum & flash-attn
TL;DR - Transcribe 150 minutes (2.5 hours) of audio in less than 98 seconds - with OpenAI's Whisper Large v3. Blazingly fast transcription is now a reality!⚡️
pipx install insanely-fast-whisper==0.0.14 --force
Not convinced? Here are some benchmarks we ran on a Nvidia A100 - 80GB 👇
Optimisation type | Time to Transcribe (150 mins of Audio) |
---|---|
large-v3 (Transformers) (fp32 ) | ~31 (31 min 1 sec) |
large-v3 (Transformers) (fp16 + batching [24] + bettertransformer ) | ~5 (5 min 2 sec) |
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2 ) | ~2 (1 min 38 sec) |
distil-large-v2 (Transformers) (fp16 + batching [24] + bettertransformer ) | ~3 (3 min 16 sec) |
distil-large-v2 (Transformers) (fp16 + batching [24] + Flash Attention 2 ) | ~1 (1 min 18 sec) |
large-v2 (Faster Whisper) (fp16 + beam_size [1] ) | ~9.23 (9 min 23 sec) |
large-v2 (Faster Whisper) (8-bit + beam_size [1] ) | ~8 (8 min 15 sec) |
P.S. We also ran the benchmarks on a Google Colab T4 GPU instance too!
P.P.S. This project originally started as a way to showcase benchmarks for Transformers, but has since evolved into a lightweight CLI for people to use. This is purely community driven. We add whatever community seems to have a strong demand for!
We've added a CLI to enable fast transcriptions. Here's how you can use it:
Install insanely-fast-whisper
with pipx
(pip install pipx
or brew install pipx
):
pipx install insanely-fast-whisper
⚠️ If you have python 3.11.XX installed, pipx
may parse the version incorrectly and install a very old version of insanely-fast-whisper
without telling you (version 0.0.8
, which won't work anymore with the current BetterTransformers
). In that case, you can install the latest version by passing --ignore-requires-python
to pip
:
pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python"
If you're installing with pip
, you can pass the argument directly: pip install insanely-fast-whisper --ignore-requires-python
.
Run inference from any path on your computer:
insanely-fast-whisper --file-name <filename or URL>
Note: if you are running on macOS, you also need to add --device-id mps
flag.
🔥 You can run Whisper-large-v3 w/ Flash Attention 2 from this CLI too:
insanely-fast-whisper --file-name <filename or URL> --flash True
🌟 You can run distil-whisper directly from this CLI too:
insanely-fast-whisper --model-name distil-whisper/large-v2 --file-name <filename or URL>
Don't want to install insanely-fast-whisper
? Just use pipx run
:
pipx run insanely-fast-whisper --file-name <filename or URL>
[!NOTE] The CLI is highly opinionated and only works on NVIDIA GPUs & Mac. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run
insanely-fast-whisper --help
orpipx run insanely-fast-whisper --help
to get all the CLI arguments along with their defaults.
The insanely-fast-whisper
repo provides an all round support for running Whisper in various settings. Note that as of today 26th Nov, insanely-fast-whisper
works on both CUDA and mps (mac) enabled devices.
-h, --help show this help message and exit
--file-name FILE_NAME
Path or URL to the audio file to be transcribed.
--device-id DEVICE_ID
Device ID for your GPU. Just pass the device number when using CUDA, or "mps" for Macs with Apple Silicon. (default: "0")
--transcript-path TRANSCRIPT_PATH
Path to save the transcription output. (default: output.json)
--model-name MODEL_NAME
Name of the pretrained model/ checkpoint to perform ASR. (default: openai/whisper-large-v3)
--task {transcribe,translate}
Task to perform: transcribe or translate to another language. (default: transcribe)
--language LANGUAGE
Language of the input audio. (default: "None" (Whisper auto-detects the language))
--batch-size BATCH_SIZE
Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 24)
--flash FLASH
Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly. (default: False)
--timestamp {chunk,word}
Whisper supports both chunked as well as word level timestamps. (default: chunk)
--hf-token HF_TOKEN
Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips
--diarization_model DIARIZATION_MODEL
Name of the pretrained model/ checkpoint to perform diarization. (default: pyannote/speaker-diarization)
--num-speakers NUM_SPEAKERS
Specifies the exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with --min-speakers or --max-speakers. (default: None)
--min-speakers MIN_SPEAKERS
Sets the minimum number of speakers that the system should consider during diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be less than or equal to --max-speakers if both are specified. (default: None)
--max-speakers MAX_SPEAKERS
Defines the maximum number of speakers that the system should consider in diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be greater than or equal to --min-speakers if both are specified. (default: None)
How to correctly install flash-attn to make it work with insanely-fast-whisper
?
Make sure to install it via pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation
. Massive kudos to @li-yifei for helping with this.
How to solve an AssertionError: Torch not compiled with CUDA enabled
error on Windows?
The root cause of this problem is still unknown, however, you can resolve this by manually installing torch in the virtualenv like python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
. Thanks to @pto2k for all tdebugging this.
How to avoid Out-Of-Memory (OOM) exceptions on Mac?
The mps backend isn't as optimised as CUDA, hence is way more memory hungry. Typically you can run with --batch-size 4
without any issues (should use roughly 12GB GPU VRAM). Don't forget to set --device-id mps
.
pip install --upgrade transformers optimum accelerate
import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
torch_dtype=torch.float16,
device="cuda:0", # or mps for Mac devices
model_kwargs={"attn_implementation": "flash_attention_2"} if is_flash_attn_2_available() else {"attn_implementation": "sdpa"},
)
outputs = pipe(
"<FILE_NAME>",
chunk_length_s=30,
batch_size=24,
return_timestamps=True,
)
outputs
FAQs
An insanely fast whisper CLI
We found that insanely-fast-whisper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.