NeuralSpace VoiceAI Python Client
Installation
pip install -U neuralspace
Authentication
Set your NeuralSpace API Key to the environment variable NS_API_KEY
:
export NS_API_KEY=YOUR_API_KEY
Alternatively, you can also provide your API Key as a parameter when initializing VoiceAI
:
import neuralspace as ns
vai = ns.VoiceAI(api_key='YOUR_API_KEY')
Quickstart
File Transcription
import requests
import neuralspace as ns
filename = 'english_audio_sample.mp3'
print('Downloading sample audio file...')
resp = requests.get('https://github.com/Neural-Space/neuralspace-examples/raw/main/datasets/transcription/en/english_audio_sample.mp3')
with open(filename, 'wb') as fp:
fp.write(resp.content)
vai = ns.VoiceAI()
config = {
'file_transcription': {
'language_id': 'en',
'mode': 'advanced',
},
}
job_id = vai.transcribe(file=filename, config=config)
print(f'Created job: {job_id}')
result = vai.get_job_status(job_id)
print(f'Current status:\n{result}')
print('Waiting for completion...')
result = vai.poll_until_complete(job_id)
print(result)
Output:
Downloading sample audio file...
Created job: 93e229c7-912d-43aa-9d87-96f873f69882
Current status:
{
"success": True,
"message": "Data fetched successfully",
"data": {
"timestamp": 1695210581508,
"filename": "english_audio_sample.mp3",
"jobId": "93e229c7-912d-43aa-9d87-96f873f69882",
"filePath": "uploads/bf377596-7a1d-4de9-82a7-9799d83f0ad9",
"params": {
"file_transcription": {
"language_id": "en",
"mode": "advanced"
}
},
"status": "Queued",
"audioDuration": 131.568,
"messsage": "",
"progress": [
"Queued"
]
}
}
Waiting for completion...
{
"success": true,
"message": "Data fetched successfully",
"data": {
"timestamp": 1695210581508,
"filename": "english_audio_sample.mp3",
"jobId": "93e229c7-912d-43aa-9d87-96f873f69882",
"params": {
"file_transcription": {
"language_id": "en",
"mode": "advanced"
}
},
"status": "Completed",
"audioDuration": 131.568,
"messsage": "",
"progress": [
"queued",
"Started",
"Transcription Started",
"Transcription Completed",
"Completed"
],
"result": {
"transcription": {
"transcript": "We've been at this for hours now. Have you found anything useful in any of those books? Not a single thing, Lewis. I'm sure that there must be something in this library. It's not like there's nothing left to be discovered. Well, I have to say that I'm tired of searching. I'm gonna take a little break. You come and cut us. I am getting a little hungry. Do you want to get someone to eat? Yeah. Food town's great right about now. What was that noise, Curtis? Did you hear that? Yes, I heard that, Lewis. I don't know, but it sounded like it came from the back of the library. Let's check it out. Okay, where you go first? Looks like a book is falling off one of the shelves. It's an old book, but it looks a bit. It's a little dusty and I can't make out what it says. Look at this, Lewis. The last treasure of Lima. Lima? Isn't that the capital city of Peru? Yes, Lewis. And it looks like there's been a treasure missing for centuries now. Look at this, Lewis. Apparently, lost treasure is located inside a temple on the outskirts of Lima. Looks like this book is a map to the treasure. Either even corn that's written down on this page. Let's get some food and plan out this next adventure. As soon as we get to Peru, I'll go straight to these coordinates that are written in the book. Great, I'll talk to you again on our land. 92, 93, 94. I'll meet at the exact location Lewis and I don't see anything. There's absolutely nothing to be seen here, just trees. Faith, look around, is there anything written on any tree? I hope this wasn't a waste of time.",
"timestamps": [
{
"word": "We've",
"start": 6.69,
"end": 7.03,
"conf": 0.99
},
{
"word": "been",
"start": 7.03,
"end": 7.09,
"conf": 0.99
},
{
"word": "at",
"start": 7.09,
"end": 7.23,
"conf": 0.99
},
{
"word": "this",
"start": 7.23,
"end": 7.37,
"conf": 0.97
},
{
"word": "for",
"start": 7.37,
"end": 7.47,
"conf": 0.97
},
{
"word": "hours",
"start": 7.47,
"end": 7.87,
"conf": 0.56
},
{
"word": "now.",
"start": 7.87,
"end": 8.43,
"conf": 1
}
...
]
}
}
}
}
Streaming Real-Time Transcription
The following example shows how to use NeuralSpace VoiceAI to transcribe microphone input in real-time.
It uses the PyAudio library: pip install pyaudio
PyAudio depends on the PortAudio library. It needs to be installed via your OS package manager.
import json
import threading
from queue import Queue
import pyaudio
import neuralspace as ns
q = Queue()
def listen(in_data, frame_count, time_info, status):
q.put(in_data)
return (None, pyaudio.paContinue)
def send_audio(q, ws):
try:
while True:
data = q.get()
ws.send_binary(data)
except:
print('Stopped sending audio.')
vai = ns.VoiceAI()
pa = pyaudio.PyAudio()
with vai.stream('en') as ws:
stream = pa.open(
rate=16000,
channels=1,
format=pyaudio.paInt16,
frames_per_buffer=4096,
input=True,
output=False,
stream_callback=listen,
)
t = threading.Thread(target=send_audio, args=(q, ws))
t.start()
print('Listening...')
try:
while True:
resp = ws.recv()
resp = json.loads(resp)
text = resp['text']
if resp['full']:
print('\r' + ' ' * 120, end='', flush=True)
print(f'\r{text}', flush=True)
else:
if len(text) > 120:
text = f'...{text[-115:]}'
print(f'\r{text}', end='', flush=True)
except KeyboardInterrupt:
print('\nFinishing.')
Text to Speech
import neuralspace as ns
vai = ns.VoiceAI()
data = {
"text": "كيف حالك",
"speaker_id": "ar-female-Nadia-saudi-neutral",
"stream": True,
"config": {
"pace": 1,
"volume": 1
}
}
result = vai.synthesize(data=data)
print(f'result with stream=True:\n{result}')
data['stream'] = False
result = vai.synthesize(data=data)
print(f'result with stream=False:\n{result}')
job_id = result['data']['jobId']
result = vai.get_tts_job_status(job_id)
print(f'Details of the job:\n{result}')
result = vai.delete_tts_job(job_id)
print(f'Response after deleting the job:\n{result}')
query_params = {
"pageNumber": 2,
"pageSize": 10,
"sort": "asc"
}
result = vai.get_tts_jobs(query_params=query_params)
print(f'Fetching the details of all previous jobs:\n{result}')
Output:
result with stream=True:
b'RIFF$\xb4\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00"V\x00\x00D\xac\x00\x00\x02\x00\x10\x00data\x00\xb4\x00\x00\x0e\x00\x0b\x00\x0e\x00\x0f\x00\x0f\x00\x0e\x00\r\x00\t\x00\x0c\x00\r\x00\x12\x00\x0c\x00\r\x00\n\x00\x0b\x00\x0b\x00\x0f\x00\x10\x00\x0b\x00\r\x00\x11\x00\x10\x00\x11\x00\x12\x00\x11\x00\x11\x00\x0b\x00\x10\x00\x10\x00\x0b\x00\x12\x00\x0e\x00\t\x00\x12\x00\x19\x00\x14\x00\x12\x00\x0f\x00\x12\x00\r\x00\r\x00\x10\x00...'
result with stream=False:
{
"success": true,
"message": "Job created successfully",
"data": {
"jobId": "8cf89d36-b55e-4c4f-a480-65bcd8484fae",
"timestamp": 1701418572768,
"result": {
"save_path": "https://largefilestoreprod.blob.core.windows.net/common/uploads/6272df27-81a6-442a-bb7a-f98b63243604"
}
}
}
Details of the job:
{
"success": true,
"message": "Data fetched successfully",
"data": {
"timestamp": 1701418685869,
"jobId": "4170883b-5ef9-4395-8dd1-deef17e140f8",
"text": "كيف حالك",
"params": {
"pace": 1,
"volume": 1,
"speaker_id": "ar-female-Nadia-saudi-neutral",
"language_id": "ar"
},
"status": "Completed",
"result": {
"save_path": "https://largefilestoreprod.blob.core.windows.net/common/uploads/6272df27-81a6-442a-bb7a-f98b63243604"
},
"audioDuration": 2
}
}
Response after deleting the job:
{
"success": true,
"message": "Job and associated files deleted successfully.",
"data": {
"deletedCount": 1
}
}
Fetching the details of all previous jobs:
{
"success": true,
"message": "Data fetched successfully",
"data": {
"jobs": [
{
"timestamp": 1701326878096,
"jobId": "a5bcc6fe-f3c6-4efa-ba26-d2e28e8e8914",
"text": "hello how are you",
"params": {
"pace": 1,
"volume": 1,
"pitch_shift": 0.5,
"pitch_scale": 0.5,
"speaker_id": "ar-female-Nadia-saudi-neutral",
"language_id": "ar"
},
"status": "Completed",
"audioDuration": 2
},
{
"timestamp": 1701326955634,
"jobId": "b3bdddbe-3b85-4068-ba59-d659e7469bd3",
"text": "hello how are you",
"params": {
"pace": 1,
"volume": 1,
"pitch_shift": 0.5,
"pitch_scale": 0.5,
"speaker_id": "ar-female-Nadia-saudi-neutral",
"language_id": "ar"
},
"status": "Completed",
"audioDuration": 2
},
{
"timestamp": 1701326972188,
"jobId": "567c720a-37c5-4132-bc10-647207d8e1ad",
"text": "hello how are you",
"params": {
"pace": 1,
"volume": 1,
"pitch_shift": 0.5,
"pitch_scale": 0.5,
"speaker_id": "ar-female-Nadia-saudi-neutral",
"language_id": "ar"
},
"status": "Completed",
"audioDuration": 2
}
...
],
"total": 27,
"pageSize": 10,
"page": 2
}
}
More Features
To enable additional features for file transcription such as automatic language detection, speaker diarization, translation and more, check out the NeuralSpace VoiceAI Docs.
List Languages
To get the list of supported language codes based on the transcription type, use:
langs = vai.languages('file')
langs = vai.languages('stream')
List voices
To get the list of supported voices along with its metadata, use:
voices = vai.voices()
Job Config
Instead of providing any config or params as a dict
, you can provide it as a str
, pathlib.Path
or a file-like object.
job_id = vai.transcribe(
file='path/to/audio.wav',
config='{"file_transcription": {"language_id": "en", "mode": "advanced", "number_formatting": "words"}}',
)
job_id = vai.transcribe(
file='path/to/audio.wav',
config='path/to/config.json',
)
with open('path/to/config.json') as fp:
job_id = vai.transcribe(
file='path/to/audio.wav',
config=fp
)
Wait for Completion
You can also poll for the status and wait until the job completes:
result = vai.poll_until_complete(job_id)
print(result['data']['result']['transcription']['transcript'])
Note: This will block the calling thread until the job is complete.
Callbacks
You can also provide a callback function when creating the job.
It will be called with the result once the job completes.
def callback(result):
print(f'job completed: {result["data"]["jobId"]}')
print(result['data']['result']['transcription']['transcript'])
job_id = vai.transcribe(file='path/to/audio.wav', config=config, on_complete=callback)
Note: transcribe()
will return the job_id
as soon as the job is scheduled, and the provided callback will be called on a new thread. The calling thread will not be blocked in this case.