
Security News
MCP Community Begins Work on Official MCP Metaregistry
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
A simple GUI to transcribe audio using OpenAI's Whisper models.
Whisper-UI requires Python.
If you are not familiar with Python, be sure to install from python.org. Be sure to confirm the following:
tkinter
, tk
, or tcl
, be sure to check it. tkinter
is required for this program to run.After installation, it is simple to confirm that everything went well:
python --version
. If python
is unrecognized, then Python has not been added to your PATH (or is not installed).python -c "import tkinter"
. If nothing happens, you are all set. If you see some error, tkinter
was not installed with your Python distribution.Whisper relies on a popular open-source audio/video converter called ffmpeg
. You must install this as well.
ffmpeg source code for Windows
Here is a Windows tutorial, as well as a good StackExchange post for Mac.
Download Whisper UI.cmd
from this repository (or click the download link for Windows). Place it wherever you like on your computer. You can launch the program by running this file. Expect it to take a bit of time to start up the first time you run it as it installs itself.
Download Whisper UI.sh
from this repository (or click the download link for Linux/Mac). Place it wherever you like on your computer. You can launch the program by running this file. Expect it to take a bit of time to start up the first time you run it as it installs itself.
A list of available models can be viewed here. A checkmark indicates you have already downloaded the model, while a download symbol is shown otherwise. Simply click on a model to initiate the download process.
If the UI is glitching at all, try navigating here and clicking "Reload window."
Most of the window is occupied by the console, which will display information as you adjust settings and run transcription. The "clear output" button at the bottom of the console can be used to erase all the information on screen.
The first text box allows you to entire a Unix-style pathname pattern to find audio files you want to transcribe:
audio_files
which contains sample1.mp3
and sample2.mp3
, you can grab both of them at once by writing audio_files/*.mp3
(or audio_files/*
if there are no other files in the folder).Once you have entered a path or paths, you can click "List files" to display a list of all files that were found.
If you are ready to transcribe, you can hit "Transcribe." Acceptable filetypes include: .flac
, .m4a
, .mp3
, .mp4
, and .wav
.
The second textbox allows you to specify the folder where you want to put the transcripts. You can enter a path to any folder. If you enter a path to a folder that doesn't exist, that folder will be created. You can click "Set output directory" to confirm the existence of the chosen folder. You can fill this box by typing or by going to "File" > "Choose output directory."
The three checkboxes below the second textbox allow you to control which kinds of output you want.
.txt
file containing the transcribed text..seg
file showing the "segments" of your audio file (lengths of speech with breaks between them). By default, this file is a tab-separated values, with each line containing the speech occurring in a segment, the start time, and the end time..json
output of Whisper, which also includes a detected language code if no language is specified."Template formatting options..." allows you to modify the format of the plain .txt
file and the way each line in the .seg
file are formatted. If you modify these, be sure to click "Save" to save your choices.
.txt
output<<<TEXT>>>
.<<<TEXT>>>...<<<TEXT>>>
into the "Template for full transcript" field..seg
output.seg
file. By default, this field contains the pattern <<<SEG>>>\t<<<START>>>\t<<<END>>>
. This formatting will write the speech segment's text, then the start time, and finally the end time, with tab
characters in between..seg
file."Currently selected Whisper model" displays the current model you are using. Any model having the .en
suffix is a monolingual English model, and should not be used for other languages. All other models are multilingual. In general, models further down the list will be more accurate, but slower to run. They may also require more memory than your computer has. It is quite safe to attempt to use any model you like, but be advised that you may need to switch to a smaller one if a larger one fails.
"Currently selected Whisper language" displays the language Whisper will use to condition its output. You can set it to "NONE" if you prefer that Whisper automatically detect the spoken language. This may also be preferable for code-switched speech, but be advised that code-switched data in general is fairly hard to find in order to train speech models on it. As such, Whisper may handle code-switching rather poorly. Note that Whisper will generally struggle with low-resource languages.
Check "Translate to English?" if you would like the transcript of your non-English audio to be output in English. Note that Whisper will generally struggle to translate from low-resource languages.
I plan to expand this project in the future to allow access to a curated collection of ASR models from HuggingFace, but this will take some time. Other models on HF under consideration include WhisperX and some NVIDIA speech models like Canary and Parakeet.
I encourage feedback and suggestions for improvement. Please feel free to open an issue on the Issues page if you have any ideas or problems, or send me an email at drd92@georgetown.edu.
FAQs
A GUI for OpenAI's Whisper.
We found that whisper-ui demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.
Research
Security News
Malicious npm packages posing as developer tools target macOS Cursor IDE users, stealing credentials and modifying files to gain persistent backdoor access.