New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

praasper

Package Overview
Dependencies
Maintainers
1
Versions
41
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

praasper

A tool for automatic speech recognition and annotation

pipPyPI
Version
0.5.6
Maintainers
1

Praasper

PyPI Downloads Python GitHub License

Setup | Usage | Mechanism

Praasper is an Automatic Speech Recognition (ASR) framework designed to help researchers transribe audio files to utterance from a single word to a complete sentence with decent level of accuracy in both transcriptoin and timestamps.

mechanism

In Praasper, we adopt a rather simple and straightforward pipeline to extract utterance-level information from audio files. The pipeline includes VAD (Praditor), ASR (SenseVoiceSmall) and LLM (Qwen).

How to use

Here is one of the simplest examples:

import praasper

model = praasper.init_model()
model.annote("data_folder")

Here are some other parameters you can pass to the annote method:

ParamDefaultDescription
ASRiic/SenseVoiceSmallModel name as the ASR core. Check out FunASR's model list for available models.
LLMQwen/Qwen2.5-1.5B-InstructModel name as the LLM core. Check out Qwen's model list for available models.
input_path-Path to the folder where audio files are stored.
seg_dur10.Segment large audio into pieces, in seconds.
min_pause0.2Minimum pause duration between two utterances, in seconds.
min_speech0.2Minimum duration for an utterance, in seconds.
languageNone"zh" for Mandarin, "yue" for Cantonese, "en" for English, "ja" for Japanese, "ko" for Korean, and None for automatic language detection.

Here is an code example indicating how you can use these parameters:

import praasper

model = praasper.init_model(
    ASR="iic/SenseVoiceSmall",
    LLM="Qwen/Qwen2.5-1.5B-Instruct"
)

model.annote(
    input_path="data_folder",
    min_pause=.8,
    min_speech=.2,
    language=None,
    seg_dur=15.
)

Fine-tune Praditor

Praasper is embedded with a default set of parameters for Praditor. But the default parameters may not be always optimal. In that case, you are recommended to use a custom set of parameters for Praditor.

  • Use the lastest version of Praditor (v1.3.1). It supports VAD.
  • Annotate the audio file. Fine-tune the parameters until the results fits your standard.
  • Click Save under the Current mode (top-right corner).

Praditor will then save a .txt param file to the same folder as the input audio file, with which Praasper will overrule the default params.

ASR/LLM model recommendation

For ASR core, iic/SenseVoiceSmall is the only recommendedation at this moment.

For LLM core, the recommended models include (from large to small ones): Qwen/Qwen3-4B-Instruct-2507, Qwen/Qwen2.5-1.5B-Instruct (default). The default is small but good enough for laptop users. You are also welcome to try other Qwen models.

Mechanism

Praditor is applied to perform Voice Activity Detection (VAD) algorithm to (1) segment large audio files into smaller pieces and (2) extract utterance. It can generate intervals with millisecond-level precision. It is originally a Speech Onset Detection (SOT) algorithm we developed for langauge researchers.

SenseVoiceSmall is used to transcribe the audio file, which does not offer timestamps. It is a lightweight ASR model compatible with even laptop. It has better support for short-length audio files, compared to Whisper.

In addition, in case that users want to designate one langauge throughout transcription, an additional LLM (Qwen/Qwen2.5-1.5B-Instruct) is added to the framework to correct potential error in the transcription.

Setup

pip installation

pip install -U praasper

If you have a succesful installation and don't care if there is GPU accelaration, you can stop it right here.

GPU Acceleration (Windows/Linux)

Currently, Praasper utilizes SenseVoiceSmall from FunASR as the ASR core.

FunASR can automaticly detects the best currently available device to use. But you still need to first install GPU-support version torch in order to enable CUDA acceleration.

  • For macOS users, only CPU is supported as the processing device.
  • For Windows/Linux users, the priority order should be: CUDA -> CPU.

If you have no experience in installing CUDA, follow the steps below:

First, go to command line and check the latest CUDA version your system supports:

nvidia-smi

Results should pop up like this (It means that this device supports CUDA up to version 12.9).

| NVIDIA-SMI 576.80                 Driver Version: 576.80         CUDA Version: 12.9     |

Next, go to NVIDIA CUDA Toolkit and download the latest version, or whichever version that fits your system/need.

Lastly, install torch that fits your CUDA version. Find the correct pip command in this link.

Here is an example for CUDA 12.9:

pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129

(Advanced) uv installation

uv is also highly recommended for way FASTER installation. First, make sure uv is installed to your default environment:

pip install uv

Then, create a virtual environment (e.g., .venv):

uv venv .venv

You should see a new .venv folder pops up in your project folder now. (You might also want to restart the terminal.)

Lastly, install praasper (by adding uv before pip):

uv pip install -U praasper

For CUDA support, here is an example for downloading torch that fits CUDA 12.9:

uv pip install --reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu129

Dev Plan

  • Add more LLM models support.
  • Seperate LLM strategies for error correction and language correction.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts