
Security News
Nx npm Packages Compromised in Supply Chain Attack Weaponizing AI CLI Tools
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.
Wrapper around Microsoft Cognitive Services - Speaker Recognition API
sign up and pick up a new api key (speaker recognition API key) https://www.microsoft.com/cognitive-services
$ gem install voice_id
#create a new profile
identification = VoiceId::Identification.new("MS_speaker_recognition_api_key")
profile = identification.create_profile
# => { "identificationProfileId" => "49a46324-fc4b-4387-aa06-090cfbf0214f" }
# create a new enrollment for that profile
profile_id = profile["identificationProfileId"]
path_to_audio = '/path/to/some/audio_file.wav'
short_audio = true
operation_url = identification.create_enrollment(profile_id , short_audio, path_to_audio)
# => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5"
# check the status of operation
operation_id = identification.get_operation_id(operation_url)
# => "EF217D0C-9085-45D7-AAE0-2B36471B89B5"
identification.get_operation_status(operation_id)
# notice below that we only had 13.6 seconds of useable audio so we need to
# submit more enrollments for this profile until we achieve at min 30 seconds
# =>
# {
# "status" => "succeeded",
# "createdDateTime" => "2016-09-23T01:34:44.226642Z",
# "lastActionDateTime" => "2016-09-23T01:34:44.4795299Z",
# "processingResult" => {
# "enrollmentStatus" => "Enrolling",
# "remainingEnrollmentSpeechTime" => 16.4,
# "speechTime" => 13.6,
# "enrollmentSpeechTime"=>13.6
# }
# }
# identify a speaker
profile_ids = ["49a46324-fc4b-4387-aa06-090cfbf0214f", "49a36324-fc4b-4387-aa06-091cfbf0216b", ...]
path_to_test_audio = '/path/to/some/audio_file.wav'
short_audio = true
identification_operation_url = identification.identify_speaker(profile_ids, short_audio, path_to_test_audio)
# => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B6"
identification_operation_id = identification.get_operation_id(identification_operation_url)
# => "EF217D0C-9085-45D7-AAE0-2B36471B89B6"
identification.get_operation_status(identification_operation_id)
# =>
# {
# "status" => "succeeded",
# "createdDateTime" => "2016-09-23T02:01:54.6498703Z",
# "lastActionDateTime" => "2016-09-23T02:01:56.054633Z",
# "processingResult" => {
# "identifiedProfileId" => "49a46324-fc4b-4387-aa06-090cfbf0214f",
# "confidence"=>"High"
# }
# }
Provides methods for two APIs (Identification and Verification) All audio samples provided to the API must be the following format:
Container WAV
Encoding PCM
Rate 16K
Sample Format 16 bit
Channels Mono
Identify a person from a list of people - this is a text-independant api. Prior to being able to identify a speaker, a speaker (profile) must send a minimum of 30 seconds of recognizable audio.
identification = VoiceId::Identification.new("MS_speaker_recognition_api_key")
Each person needs a unique profile, this creates a new one.
profile = identification.create_profile
# => { "identificationProfileId" => "49a36324-fc4b-4387-aa06-090cfbf0064f" }
An enrollment is how audio samples are associated with a profile (training the service). For the Identification API a minimum of 30 seconds of recognizable speach is required. This can be done through multiple enrollments. This creates a new enrollment for a profile.
profile_id = "1234567890"
path_to_audio = '/path/to/some/audio_file.wav'
short_audio = true # true - set minimum duration to 1 sec (5 sec by default per enrollment)
identification.create_enrollment(profile_id, short_audio, path_to_audio)
# returns an operation url that you can use to check the status of the enrollment
# => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5"
Certain endpoints take time to process to they return a url for you to check on the status of the operation. To parse out the operation id use this method. Now you can use #get_operation_status to check the id.
operation_status_url = identification.create_enrollment(profile_id, short_audio, path_to_audio)
# => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5"
identification_operation_id = identification.get_operation_id(operation_status_url)
# => "EF217D0C-9085-45D7-AAE0-2B36471B89B6"
Check on the status of an operation by passing in the operation id (use #get_operation_id to get the id)
identification.get_operation_status(identification_operation_id)
# =>
# {
# "status" => "succeeded",
# "createdDateTime" => "2016-09-23T02:01:54.6498703Z",
# "lastActionDateTime" => "2016-09-23T02:01:56.054633Z",
# "processingResult" => {
# "identifiedProfileId" => "49a59333-ur9d-4387-wd06-880cfby0215f",
# "confidence"=>"High"
# }
# }
Delete a particular profile from the service.
profile_id = "1234567890"
identification.delete_profile(profile_id)
# => true
Returns a list of all the profiles for this account.
identification.get_all_profiles
# =>
# [
# {
# "identificationProfileId" => "111f427c-3791-468f-b709-fcef7660fff9",
# "locale" => "en-US",
# "enrollmentSpeechTime" => 0.0
# "remainingEnrollmentSpeechTime" => 0.0,
# "createdDateTime" => "2015-04-23T18:25:43.511Z",
# "lastActionDateTime" => "2015-04-23T18:25:43.511Z",
# "enrollmentStatus" => "Enrolled" //[Enrolled | Enrolling | Training]
# }, ...
# ]
Returns a profile's details
profile_id = "1234567890"
identification.get_profile(profile_id)
# =>
# {
# "identificationProfileId" => "111f427c-3791-468f-b709-fcef7660fff9",
# "locale" => "en-US",
# "enrollmentSpeechTime" => 0.0,
# "remainingEnrollmentSpeechTime" => 0.0,
# "createdDateTime" => "2015-04-23T18:25:43.511Z",
# "lastActionDateTime" => "2015-04-23T18:25:43.511Z",
# "enrollmentStatus" => "Enrolled" //[Enrolled | Enrolling | Training]
# }
Resets all the enrollments for a particular profile
profile_id = "1234567890"
identification.reset_all_enrollments_for_profile(profile_id)
# => true
Identify a speaker by calling this method with an array of enrolled
profile_ids.
Use short_audio
to wave the required 5-second speech sample.
The audio sample to be analyzed should ideally be 30 seconds, with a maximum of 5 mins.
profile_ids = ["49a46324-fc4b-4387-aa06-090cfbf0214f", "49a36324-fc4b-4387-aa06-091cfbf0216b", ...]
path_to_test_audio = '/path/to/some/audio_file.wav'
short_audio = true
operation_url = identification.identify_speaker(profile_ids, short_audio, path_to_test_audio)
# => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B6"
identification_operation_id = identification.get_operation_id(operation_url)
# => "EF217D0C-9085-45D7-AAE0-2B36471B89B6"
identification.get_operation_status(identification_operation_id)
# =>
# {
# "status" => "succeeded",
# "createdDateTime" => "2016-09-23T02:01:54.6498703Z",
# "lastActionDateTime" => "2016-09-23T02:01:56.054633Z",
# "processingResult" => {
# "identifiedProfileId" => "49a46324-fc4b-4387-aa06-090cfbf0214f",
# "confidence"=>"High"
# }
# }
Verify that a person is who they say they are - this is a text-dependent api. Prior to being able to verify a speaker, a speaker (profile) must send three audio samples (from an API provided list) with their enrollment.
verification = VoiceId::Verification.new("MS_speaker_recognition_api_key")
Get a list of accepted scripts to use when sending your audio sample.
verification.list_all_verification_phrases
# =>
# [
# {"phrase" => "i am going to make him an offer he cannot refuse"},
# {"phrase" => "houston we have had a problem"},
# {"phrase" => "my voice is my passport verify me"},
# {"phrase" => "apple juice tastes funny after toothpaste"},
# {"phrase" => "you can get in without your password"},
# {"phrase" => "you can activate security system now"},
# {"phrase" => "my voice is stronger than passwords"},
# {"phrase" => "my password is not your business"},
# {"phrase" => "my name is unknown to you"},
# {"phrase" => "be yourself everyone else is already taken"}
# ]
Same as Identification API
Requires 3 enrollments. Pick 3 of the acceptable phrases from #list_all_verification_phrases
and enroll them.
verification.create_enrollment("49a46324-fc4b-4387-aa06-090cfbf0214f", '/path/to/audio/make_him_an_offer.wav')
# =>
# {
# "enrollmentStatus" => "Enrolling",
# "enrollmentsCount" => 1,
# "remainingEnrollments" => 2,
# "phrase" => "i am going to make him an offer he cannot refuse"
# }
Same as Identification API
Same as Identification API
Same as Identification API
Same as Identification API
User (profile) would have had to enroll with 3 of the accepted phrases (#list_all_verification_phrases). Once the phrases have been accepted, a recording of one of the accepted phrases can be checked against an enrolled profile.
verification.verify_speaker("86935587-b631-4cc7-a59t-8e580d71522g", "/path/to/audio/offer_converted.wav")
# =>
# {
# "result" => "Accept",
# "confidence" => "High",
# "phrase" => "i am going to make him an offer he cannot refuse"
# }
FAQs
Unknown package
We found that voice_id demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.
Security News
CISA’s 2025 draft SBOM guidance adds new fields like hashes, licenses, and tool metadata to make software inventories more actionable.
Security News
A clarification on our recent research investigating 60 malicious Ruby gems.