speaker-verification-toolkit
This module contains some tools to make a simple speaker verification.
You can download it with PyPI:
$ pip install speaker-verification-toolkit
To import and use in your own projects:
import speaker_verification_toolkit.tools as svt
Usage
find_nearest_voice_data(voice_data_list, voice_sample)
Find the nearest voice data based on this voice sample. Could be used to make the naive Accept/Reject decision.
voice_data_list: a list containing all voices data from the dataset.
voice_sample: the voice sample reference.
returns: the index of the element from voice_data_list that represents the nearest voice data.
compute_distance(sample1, sample3)
Compute the distance between sample1 and sample2 using O(n) DTW algorithm
sample1: the mfcc data extracted from the audio signal 1.
sample2: the mfcc data extracted from the audio signal 2.
returns: Float number representing the minimum distance between sample1 and sample2.
extract_mfcc(signal_data, samplerate=16000, winlen=0.025, winstep=0.01)
Compute MFCC features from an audio signal
signal: the audio signal from which to compute features. Should be an N*1 array.
samplerate: the sample rate of the signal we are working with, in Hz.
winlen: the length of the analysis window in seconds. Default is 0.025s (25 milliseconds).
winstep: the step between successive windows in seconds. Default is 0.01s (10 milliseconds).
returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.
extract_mfcc_from_wav_file(path, samplerate=16000, winlen=0.025, winstep=0.01)
Compute MFCC features from a wav file
path: the wav file path to be open.
samplerate: the wanted sample rate, in Hz. Default is 16000. If you want no resampling fill this argument with None.
winlen: the length of the analysis window in seconds. Default is 0.025s (25 milliseconds).
winstep: the step between successive windows in seconds. Default is 0.01s (10 milliseconds).
returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.
rms_silence_filter(data, samplerate=16000, segment_length=None, threshold=0.001135)
Cut off silence parts from the signal audio data. Doesn't work with signals data affected by environment noise.
You would consider apply a noise filter before using this silence filter or make sure that environment noise is small enough to be considered as silence.
data: the audio signal data
samplerate: if no segment_length is given, segment_length will be equals samplerate/100 (around 0.01 secs per segment).
segment_length: the number of frames per segment. I.e. for a sample rate SR, a segment length equals SR/100 will represent a chunk containing 0.01 seconds of audio.
threshold: the threshold value. Values less than or equal values will be cut off. The default value was defined at [1] (see the references).
returns: the param "data" without silence parts.
References
[1] - Muhammad Asadullah & Shibli Nisar, "A SILENCE REMOVAL AND ENDPOINT DETECTION APPROACH FOR SPEECH PROCESSING", National University of Computer and Emerging Sciences, Peshawar