
VoiceId
Wrapper around Microsoft Cognitive Services - Speaker Recognition API
Installation
sign up and pick up a new api key (speaker recognition API key)
https://www.microsoft.com/cognitive-services
$ gem install voice_id
Examples
identification = VoiceId::Identification.new("MS_speaker_recognition_api_key")
profile = identification.create_profile
profile_id = profile["identificationProfileId"]
path_to_audio = '/path/to/some/audio_file.wav'
short_audio = true
operation_url = identification.create_enrollment(profile_id , short_audio, path_to_audio)
operation_id = identification.get_operation_id(operation_url)
identification.get_operation_status(operation_id)
profile_ids = ["49a46324-fc4b-4387-aa06-090cfbf0214f", "49a36324-fc4b-4387-aa06-091cfbf0216b", ...]
path_to_test_audio = '/path/to/some/audio_file.wav'
short_audio = true
identification_operation_url = identification.identify_speaker(profile_ids, short_audio, path_to_test_audio)
identification_operation_id = identification.get_operation_id(identification_operation_url)
identification.get_operation_status(identification_operation_id)
APIs
Provides methods for two APIs (Identification and Verification)
All audio samples provided to the API must be the following format:
Container WAV
Encoding PCM
Rate 16K
Sample Format 16 bit
Channels Mono
Identification API
Identify a person from a list of people - this is a text-independant api.
Prior to being able to identify a speaker, a speaker (profile) must send a minimum
of 30 seconds of recognizable audio.
identification = VoiceId::Identification.new("MS_speaker_recognition_api_key")
create_profile
Each person needs a unique profile, this creates a new one.
profile = identification.create_profile
create_enrollment(profile_id, short_audio, audio_file_path)
An enrollment is how audio samples are associated with a profile (training the service). For the Identification API a minimum of 30 seconds of recognizable speach is required. This can be done through multiple enrollments. This creates a new
enrollment for a profile.
profile_id = "1234567890"
path_to_audio = '/path/to/some/audio_file.wav'
short_audio = true
identification.create_enrollment(profile_id, short_audio, path_to_audio)
get_operation_id(operation_status_url)
Certain endpoints take time to process to they return a url for you to check on the status of the operation. To parse out the operation id use this method. Now you can use #get_operation_status to
check the id.
operation_status_url = identification.create_enrollment(profile_id, short_audio, path_to_audio)
identification_operation_id = identification.get_operation_id(operation_status_url)
get_operation_status(operation_id)
Check on the status of an operation by passing in the operation id (use #get_operation_id to get the id)
identification.get_operation_status(identification_operation_id)
delete_profile(profile_id)
Delete a particular profile from the service.
profile_id = "1234567890"
identification.delete_profile(profile_id)
get_all_profiles
Returns a list of all the profiles for this account.
identification.get_all_profiles
get_profile(profileId)
Returns a profile's details
profile_id = "1234567890"
identification.get_profile(profile_id)
reset_all_enrollments_for_profile(profileId)
Resets all the enrollments for a particular profile
profile_id = "1234567890"
identification.reset_all_enrollments_for_profile(profile_id)
identify_speaker(profile_ids, short_audio, audio_file_path)
Identify a speaker by calling this method with an array of enrolled
profile_ids.
Use short_audio
to wave the required 5-second speech sample.
The audio sample to be analyzed should ideally be 30 seconds, with a maximum of 5 mins.
profile_ids = ["49a46324-fc4b-4387-aa06-090cfbf0214f", "49a36324-fc4b-4387-aa06-091cfbf0216b", ...]
path_to_test_audio = '/path/to/some/audio_file.wav'
short_audio = true
operation_url = identification.identify_speaker(profile_ids, short_audio, path_to_test_audio)
identification_operation_id = identification.get_operation_id(operation_url)
identification.get_operation_status(identification_operation_id)
Verification API
Verify that a person is who they say they are - this is a text-dependent api.
Prior to being able to verify a speaker, a speaker (profile) must send three audio samples (from an API provided list) with their enrollment.
verification = VoiceId::Verification.new("MS_speaker_recognition_api_key")
list_all_verification_phrases
Get a list of accepted scripts to use when sending your audio sample.
verification.list_all_verification_phrases
create_profile
Same as Identification API
create_enrollment(profile_id, audio_file_path)
Requires 3 enrollments. Pick 3 of the acceptable phrases from #list_all_verification_phrases
and enroll them.
verification.create_enrollment("49a46324-fc4b-4387-aa06-090cfbf0214f", '/path/to/audio/make_him_an_offer.wav')
delete_profile(profile_id)
Same as Identification API
get_all_profiles
Same as Identification API
get_profile(profile_id)
Same as Identification API
reset_all_enrollments_for_profile(profile_id)
Same as Identification API
verify_speaker(profile_id, audio_file_path)
User (profile) would have had to enroll with 3 of the accepted phrases (#list_all_verification_phrases).
Once the phrases have been accepted, a recording of one of the accepted phrases can be checked against an enrolled profile.
verification.verify_speaker("86935587-b631-4cc7-a59t-8e580d71522g", "/path/to/audio/offer_converted.wav")