Socket
Socket
Sign inDemoInstall

africanwhisper

Package Overview
Dependencies
0
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    africanwhisper

A package for fast fine-tuning and API endpoint deployment of the Whisper model, specifically developed to accelerate Automatic Speech Recognition (ASR) for African languages.


Maintainers
1

Readme

African Whisper: ASR for African Languages

Twitter Last commit License

Enhancing Automatic Speech Recognition (ASR): translation and transcription capabilities for African languages by providing seamless fine-tuning and deploying pipelines for Whisper Model.
Diagram

Features

  • 🔧 Fine-tune the Whisper model on any audio dataset from Huggingface, e.g., Mozilla's Common Voice datasets.

  • 📊 View training run metrics on Wandb.

  • 🎙️ Test your fine-tuned model using Gradio UI or directly on an audio file (.mp3 or .wav).

  • 🚀 Deploy an API endpoint for audio file transcription or translation.

  • 🐳 Containerize your API endpoint application and push to DockerHub.

Why Whisper? 🤔

  • 🌐 Extensive Training Data: Trained on 680,000 hours of multilingual and multitask(translation and transcription) supervised data from the web.

  • 🗣️ Sequence-based Understanding: Whisper considers the full sequence of spoken words, ensuring accurate context recognition, unlike Word2Vec.

  • 💻 Simplification for Applications: Deploy one model for transcribing and translating a multitude of languages, without sacrificing quality or context.

For more details, you can refer to the Whisper ASR model paper.
Want proof, check this repo

🚀 Getting Started

Prerequisites

  • Sign up to HuggingFace and get your token keys use this guide.

  • Sign up to Weights and Biases and get your token keys use this guide

  • Demo video here

Colab

Step 1: Installation

!pip install africanwhisper
# If you're on Colab, restart the session due to issue with numpy installation on colab.

Step 2: Set Parameters

# Set the parameters (refer to the 'Usage on VM' section for more details)
huggingface_read_token = " "
huggingface_write_token = " "
dataset_name = "mozilla-foundation/common_voice_16_1" 
language_abbr= [ ]                                    # Example `["ti", "yi"]`. see abbreviations here https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1. 
                                                      # Note: choose a small dataset so as to not run out of memory,
model_id= "model-id"                                  # Example openai/whisper-small, openai/whisper-medium
processing_task= "automatic-speech-recognition" 
wandb_api_key = " "
use_peft = True                                       # Note: PEFT only works on a notebook with GPU-support.

Step 3: Prepare the Model

from training.data_prep import DataPrep

# Initialize the DataPrep class and prepare the model
process = DataPrep(
    huggingface_read_token,
    dataset_name,
    language_abbr,
    model_id,
    processing_task,
    use_peft
)
tokenizer, feature_extractor, feature_processor, model = process.prepare_model()

Step 4: Preprocess the Dataset

# Load and preprocess the dataset
processed_dataset = process.load_dataset(
    feature_extractor=feature_extractor,
    tokenizer=tokenizer,
    processor=feature_processor,
    num_samples = None # Number of samples to load from each dataset
                       # Set None to load the entire dataset
                       # Example: num_samples = 100 will load 100 samples from each dataset

Step 5: Train the Model

from training.model_trainer import Trainer

# Initialize the Trainer class and train the model
trainer = Trainer(
    huggingface_write_token,
    model_id,
    processed_dataset,
    model,
    feature_processor,
    feature_extractor,
    tokenizer,
    wandb_api_key,
    use_peft
)
trainer.train(
    max_steps=100,
    learning_rate=1e-3,
    per_device_train_batch_size=96,
    per_device_eval_batch_size=64,
    optim="adamw_bnb_8bit"
)

# Optional parameters for training:
#     max_steps (int): The maximum number of training steps (default is 100).
#     learning_rate (float): The learning rate for training (default is 1e-5).
#     per_device_train_batch_size (int): The batch size per GPU for training (default is 96).
#     per_device_eval_batch_size (int): The batch size per GPU for evaluation (default is 64).
#     optim (str): The optimizer used for training (default is "adamw_bnb_8bit")

Step 6: Test Model using an Audio File

# Using a PEFT fine-tuned model
from deployment.peft_speech_inference import SpeechInference

model_name = "your-finetuned-model-name-on-huggingface-hub"   # e.g., "KevinKibe/whisper-small-af"
huggingface_read_token = " "
task = "desired-task"                                         # either 'translate' or 'transcribe'
audiofile_dir = "location-of-audio-file"                      # filetype should be .mp3 or .wav

# Initialize the SpeechInference class and run inference
inference = SpeechInference(model_name, huggingface_read_token)
pipeline = inference.pipe_initialization()
transcription = inference.output(pipeline, audiofile_dir, task)

# Access different parts of the output
print(transcription.text)                                       # The entire text transcription.
print(transcription.chunks)                                     # List of individual text chunks with timestamps.
print(transcription.timestamps)                                 # List of timestamps for each chunk.
print(transcription.chunk_texts)                                # List of texts for each chunk.

# Using a fully fine-tuned model
from deployment.speech_inference import SpeechTranscriptionPipeline, ModelOptimization

model_name = "your-finetuned-model-name-on-huggingface-hub"   # e.g., "KevinKibe/whisper-small-af"
huggingface_read_token = " "
task = "desired-task"                                         # either 'translate' or 'transcribe'
audiofile_dir = "location-of-audio-file"                      # filetype should be .mp3 or .wav

# Optimize model for better results
model_optimizer = ModelOptimization(model_name=model_name)
model_optimizer.convert_model_to_optimized_format()
model = model_optimizer.load_transcription_model()

# Initiate the transcription model
inference = SpeechTranscriptionPipeline(
        audio_file_path=audiofile_dir,
        task=task,
        huggingface_read_token=huggingface_read_token
    )

# To get transcriptions
transcription = inference.transcribe_audio(model=model)
print(transcription)

# To get transcriptions with speaker labels
alignment_result = inference.align_transcription(transcription)
diarization_result = inference.diarize_audio(alignment_result)
print(diarization_result)

#To generate subtitles(.srt format), will be saved in root directory
inference.generate_subtitles(transcription, alignment_result, diarization_result)

🖥️ Using the CLI

Step 1: Clone and Install Dependencies

  • Clone the Repository: Clone or download the application code to your local machine.
git clone https://github.com/KevKibe/African-Whisper.git
  • Create a virtual environment for the project and activate it.
python3 -m venv env
source venv/bin/activate
  • Install dependencies by running this command
pip install -r requirements.txt
  • Navigate to:
cd src

Step 2: Finetune the Model

  • To start the training , use the following command:
python -m training.main --huggingface_read_token YOUR_HUGGING_FACE_READ_TOKEN_HERE --huggingface_write_token YOUR_HUGGING_FACE_WRITE_TOKEN_HERE --dataset_name AUDIO_DATASET_NAME --language_abbr LANGUAGE_ABBREVIATION LANGUAGE_ABBREVIATION --model_id MODEL_ID --processing_task PROCESSING_TASK --wandb_api_key YOUR_WANDB_API_KEY_HERE --use_peft

Flags:
# --use_peft: Optional flag to use PEFT finetuning. leave it out to perform full finetuning
  • Find a description of these commands here.

Step 3: Get Inference

Install ffmpeg

  • To get inference from your fine-tuned model, follow these steps:

  • Ensure that ffmpeg is installed by running the following commands:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

To get inference on CLI Locally

cd src/deployment
  • Create a .env file using nano .env command and add these keys and save the file.
MODEL_NAME = "your-finetuned-model"
HUGGINGFACE_READ_TOKEN = "huggingface-read-token"
  • To perform transcriptions and translations:
# PEFT FINETUNED MODELS
python -m deployment.peft_speech_inference_cli --audio_file FILENAME --task TASK 

# FULLY FINETUNED MODELS
python -m deployment.speech_inference_cli --audio_file FILENAME --task TASK --perform_diarization --perform_alignment

Flags:
# --perform_diarization: Optional flag to perform speaker diarization.
# --perform_alignment: Optional flag to perform alignment.

🛳️ Step 4: Deployment

  • To deploy your fine-tuned model as a REST API endpoint, follow these instructions.

Contributing

Contributions are welcome and encouraged.

Before contributing, please take a moment to review our Contribution Guidelines for important information on how to contribute to this project.

If you're unsure about anything or need assistance, don't hesitate to reach out to us or open an issue to discuss your ideas.

We look forward to your contributions!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any enquiries, please reach out to me through keviinkibe@gmail.com

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc