This repo is no longer in activate development, please use the audiostack SDK to continue using api.audio.
apiaudio - python SDK
apiaudio is the official api.audio Python 3 SDK. This SDK provides easy access to the api.audio API for applications written in python.
📝 Table of Contents
🧐 About
This repository is actively maintained by Aflorithmic Labs. For examples, recipes and api reference see the api.audio docs. Feel free to get in touch with any questions or feedback!
:book: Changelog
You can view here our updated Changelog.
:speedboat: Quickstarts
Get started with our quickstart recipes.
🏁 Getting Started
Installation
You don't need this source code unless you want to modify it. If you want to use the package, just run:
pip install apiaudio -U
pip3 install apiaudio -U
Install from source with:
python setup.py install
python3 setup.py install
Prerequisites
Python 3.6+
🚀 Hello World
Create a file hello.py
touch hello.py
Authentication
This library needs to be configured with your account's api-key which is available in your api.audio Console. Import the apiaudio package and set apiaudio.api_key
with the api-key you got from the console:
import apiaudio
apiaudio.api_key = "your-key"
Create Text to audio in 4 steps
Let's create our first audio asset.
✍️ Create a new script, our scriptText
will be the text that is later synthesized.
script = apiaudio.Script.create(scriptText="Hello world")
print(script)
🎤 Render the scriptText that was created in the previous step. Lets use voice Aria.
response = apiaudio.Speech.create(scriptId=script["scriptId"], voice="Aria")
print(response)
🎧 Now let's join the speech we just created with a sound template.
response = apiaudio.Mastering.create(
scriptId=script.get("scriptId"),
soundTemplate="jakarta"
)
print(response)
Download the final audio asset to your current working directory:
filepath = apiaudio.Mastering.download(scriptId=script["scriptId"], destination=".")
print(filepath)
Easy right? 🔮 This is the final hello.py
file.
import apiaudio
apiaudio.api_key = "your-key"
script = apiaudio.Script.create(scriptText="Hello world")
response = apiaudio.Speech.create(scriptId=script["scriptId"], voice="Aria")
print(response)
response = apiaudio.Mastering.create(
scriptId=script.get("scriptId"),
soundTemplate="jakarta"
)
print(response)
filepath = apiaudio.Mastering.download(scriptId=script["scriptId"], destination=".")
print(filepath)
Now let's run the code:
python hello.py
python3 hello.py
Once this has completed, find the downloaded audio asset and play it! :sound: :sound: :sound:
📑 Documentation
Import
import apiaudio
Authentication
The library needs to be configured with your account's secret key which is available in your Aflorithmic Dashboard. Set apiaudio.api_key
with the api-key you got from the dashboard:
apiaudio.api_key = "your-key"
Authentication with environment variable (recommended)
You can also authenticate using apiaudio_key
environment variable and the apiaudio SDK will automatically use it. To setup, open the terminal and type:
export apiaudio_key=<your-key>
If you provide both an environment variable and apiaudio.api_key
authentication value, the apiaudio.api_key
value will be used instead.
Super Organizations
In order to control a child organization of yours, please use the following method to assume that organization id.
Set your child organization id to None
to stop assuming an organization. Subsequent calls to the api will use your own organization id.
import apiaudio
apiaudio.set_assume_org_id('child_org_id')
apiaudio.set_assume_org_id(None)
See organization resource for more operations you can perform about your organization.
Resource Usage
There are two approaches to use the resources.
The recommended approach is to import all resources directly from apiaudio:
import apiaudio
apiaudio.Script.create()
Alternatively, you can import the resource classes you want to use first, and then use the resource methods. For example, to use Script
, we could do:
from apiaudio import Script
Script.create()
Same logic applies for other resources (Speech
, Voice
, Sound
...)
Organization
resource
The Organization resource/class allows you to perform some data retrieval about your organization and your child organizations.
Organization methods are:
get_org_data()
- Get organizations data, including orgId, orgName etc.
list_child_orgs()
- List your child organizations.
get_secrets()
- Get your api key, webhook url and webhook secret.
Script
resource
The Script resource/class allows you to create, retrieve and list scripts. Learn more about scripts here.
Script methods are:
-
create()
- Create a new script.
- Parameters:
scriptText
* [Required] (string) - Text for your script. A script can contain multiple sections and SSML tags. Learn more about scriptText details hereprojectName
(string) - The name of your project. Default value is "default" (max 60 characters)moduleName
(string) - The name of your module. Default value is "default" (max 60 characters)scriptName
(string) - The name of your script. Default value is "default" (max 60 characters)scriptId
(string) - Custom identifier for your script. If scriptId parameter is provided, then projectName, moduleName and scriptName are set to the same value as scriptIdversions
(dictionary) - A dictionary containing different versions of your script text, whereby the key is the version name, and its value is the associated scriptText
. Version name v0
is reserved as the default scriptText
. Default value is "{}"metadata
(dictionary) - Metadata for your script. This is limited to 2kb in size.
- Example:
text = """
<<sectionName::hello>> Hello {{username|buddy}}
<<sectionName::bye>> Good bye from {{location|barcelona}}
"""
script = apiaudio.Script.create(
scriptText=text,
projectName="myProject",
moduleName="myModule",
scriptName="myScript",
scriptId="id-1234",
metadata={"author" : "sam", "tags" : ["demo", "intro"]}
)
script = apiaudio.Script.create(
scriptText="Default text",
versions={"es" : "Hola", "en" : "hello"}
)
-
retrieve()
- Retrieve a script by id.
-
preview
- return a script with the dictionary highlighting applied. see Lexi for more examples of how to use the dictionary feature.
-
Parameters:
scriptId
* [Required] (string) - The script ID you want to use.voice
* [Required] (string) - The voice that will be used to render speech. This is required as the output can be dependent on voice, language code, or provider.
-
Example:
text = """
The author of this repo has lived in two places in the
UK, <!location>Bude<!> and <!location>Bristol<!>.
"""
r = apiaudio.Script.create(scriptText=text)
scriptId = r["scriptId"]
preview = apiaudio.Script.preview(scriptId=scriptId, language="en-gb")
-
list()
- List all scripts available in your organization. This method supports filtering.
-
delete()
- Deletes a script. By default this will delete all versions of the script.
-
delete_multiple()
- Delete multiple scripts.
-
get_random_text()
- Retrieve random text from a list of categories.
Script.Directory
resource
Lists the directory structure of your scripts, i.e. projectName/moduleName/scriptName
Script.Directory methods are:
-
list_projects()
- Lists all projects.
-
Parameters:
-
Example:
projects = apiaudio.Script.Directory.list_projects()
print(projects)
-
list_modules()
- Lists all modules within a projectPrefix.
-
Parameters:
projectPrefix
* [Required] - will list modules starting with this projectPrefix.
-
Example:
modules = apiaudio.Script.Directory.list_modules(projectPrefix="workout")
print(modules)
-
list_script_names()
- Lists all modules within a projectPrefix.
-
Parameters:
projectPrefix
* [Required] - will list scriptNames starting with this projectPrefix.modulePrefix
* [Required] - will list scriptNames starting with this modulePrefix.
-
Example:
scriptNames = apiaudio.Script.Directory.list_script_names(projectPrefix="workout_1", modulePrefix="over60s")
print(scriptNames)
Speech
resource
Speech allows you to do Text-To-Speech (TTS) with our API using all the voices available. Use it to create a speech audio file from your script.
Speech methods are:
-
create()
Send a Text-To-Speech request to our Text-To-Speech service.
-
Parameters:
scriptId
* [Required] (string) - The script IDversion
(string) - The version of the script to be produced. Default is "".voice
(string) - Voice name. See the list of available voices using Voice resource. Default voice is "Joanna".speed
(string) - Voice speed. Default speed is 100.effect
(string) - Put a funny effect in your voice. You can try the following ones: dark_father
, chewie
, 88b
, 2r2d
,silencePadding
(integer) - Add a silence padding to your speech tracks (in milliseconds). Default is 0 (no padding)audience
(dict) - Specify the values of parameters in your script. For instance, if in the script resource you have scriptText="Hello {{name}} {{lastname}}, welcome to {{location}}"
, the audience should be: {"name": "Elon", "lastname": "Musk", "location": "Istanbul"}
. If not provided, the fallback track will be created.sync
(boolean) - Allow sync or async speech creation. Default is True
. If sync=False
, speech create call will return a success message when the speech creation is triggered. To retrieve the files, check Speech.retrieve()
method.sections
(dict) - Specify parameters for specific sections in the script. The key is a section name, and the value is another dictionary with the section configuration ( valid parameters are: voice, speed, effect, silence_padding). If a section is not found here, the section will automatically inherit the voice, speed, effect and silence_padding values you defined above (or the default ones if you don't provide them). See an example below with 2 sections and different configuration parameters being used.useDictionary
(bool) - Applies pronunciation dictionary to the script text.useTextNormalizer
(bool) - Applies text normalization, which can help resolve grammatical errors with TTS pronunciations, for example 11:12 Uhr -> 11 Uhr 12. Note - this only works for german voices at present.
sections={
"firstsection": {
"voice": "Matthew",
"speed": 110,
"silence_padding": 100
},
"anothersection": {
"voice": "en-GB-RyanNeural",
"speed": 100
}
}
-
Simple example:
response = apiaudio.Speech.create(
scriptId="id-1234",
voice="Joanna"
)
-
Complete example:
response = apiaudio.Speech.create(
scriptId="id-1234",
version="abc",
voice="Matthew",
speed=100,
effect="dark_father",
silencePadding= 1000,
sync=True,
audience={"username": "Elon", "lastname": "Musk"},
sections={
"firstsection": {
"voice": "Matthew",
"speed": 110,
"silence_padding": 100,
},
"anothersection": {
"voice": "Liam",
}
}
)
-
retrieve()
Retrieve the speech file urls.
-
download()
Download the speech files in your preferred folder.
Voice
resource
Voice allows you to retrieve a list of the available voices from our API.
Voice methods are:
-
list()
List all the available voices in our API. The parameters are all optional, and can be used in combination to get the perfect voice for your usecase.
- Parameters:
provider
(string) - Try one of: google, polly, azure, msnr (aflorithmic), ibm, yandex, retro (aflorithmic), vocalid, resemblelanguage
(string) - e.g. english, spanish, french, german, etc.accent
(string) - e.g. american, british, neutral, portuguese/brazilian, american soft, mexican, australiangender
(string) - Try with one of: male, femaleageBracket
(string) - Try with one of: adult, child, seniortags
(string) - Try with one or more (separated by commas) of: steady, confident, balanced, informative, serious, instructional, slow, storytelling, calm, clear, deep, formal, sad, thin, fast, upbeat, fun, energetic, tense, very fast, flat, low pitched, high pitched, low-pitched, sing-y, cooperative, kind, stable, monotonous, neutral, responsible, business man, straight to the point, knowledgeable, focused, newscastery, newsreader, interviewer, reliable, friendly, welcoming, good for handing out information, slightly friendlyindustryExamples
(string) - Try with one or more (separated by commas) of: fitness, business, commercial, fashion, travel, audiobook, real estate, faith, health industry, comercial, realestate, kids entertainment, games, customer service, education, storytelling, entertainment, kids, education audiobooktimePerformance
(string) - The time performance of the voice. There are three categories: slow, medium, fast.sectionCharacterLimit
(string) - The maximum amount of characters that the voice can process per Script section. All of the supported providers with the exception of VocalId have the limit of 4000.
- Example:
all_voices = apiaudio.Voice.list()
- Example:
french_voices = apiaudio.Voice.list(language="french",tags="steady, fun")
-
list_parameters()
This method lets you see which attributes you can filter the voices by, along with the allowed values for each attribute. You can later use these parameters and values to filter the voices you wish to list.
Sound
resource
Sound allows you to design your own sound template from a script and a background track. In order to get a sound template/project, make sure you requested speech for your script resource first.
Sound methods are:
-
list()
List all the available sound templates in our api. The parameters are all optional, and can be used in combination to get the perfect sound for your usecase.
-
list_parameters()
This method lets you see which attributes you can filter the sound templates by, along with the allowed values for each attribute. You can later use these parameters and values to filter the sound templates you wish to list.
Mastering
resource
Mastering allows you to create and retrieve a mastered audio file of your script. A mastered version contains the speech of the script, a background track, personalised parameters for your audience and a mastering process to enhance the audio quality of the whole track. In order to get a mastered audio file, make sure you requested speech for your script resource first.
Mastering methods are:
-
create()
Create a mastered version of your script and choose the audio format.
-
Parameters:
scriptId
* [Required] (string) - The script resource ID.version
(string) - The version of the script to be produced. Default is "".soundTemplate
(string) - The sound template name. For the list of available sound templates check apiaudio.Sound.list_sound_templates()
call.public
(boolean) - Boolean flag that allows to store the mastered file in a public s3 folder. Default value is False
. Warning - This will cause your mastered files to be public to anyone in the internet. Use this at your own risk.vast
(boolean) - Boolean flag that allows to create a VAST file of your mastered file. The vast
flag only works if public
is True
. Default value is False
.endFormat
(list) - List of audio formats to be produced. Valid formats are: ["wav", "mp3" (default), "flac", "ogg", "mp3_very_low", "mp3_low", "mp3_medium", "mp3_high", "mp3_very_high", "mp3_alexa"]
forceLength
(int) - force the audio length of the mastered track (in seconds).audience
(dict) - Dictionary containing the personalisation parameters. This parameter depends on the number of parameters you used in your script resource. In the script documentation example above, we used 2 parameters: username
and location
, and in the following example below we want to produce the script for username salih
with location Barcelona
. If audience is not provided, the fallback track will be created.mediaFiles
(list) - List of dicts containing the media files. This parameter depends on the media file tags used in the script resource and the media files you have in your account. For example, if the script contains <<media::myrecording>>
plus <<media::mysong>>
, and you want to attach myrecording to mediaId = "12345", and mysong to mediaId = "67890" then mediaFiles = [{"myrecording":"12345", "mysong":"67890"}]
.mediaVolumeTrim
(float) - Floating point varible that allows you to trim the volume of uploaded media files (in dB). This attribute has a valid range of -12 to 12 dB and applies to all media files included in a single mastering call. Clipping protection is not provided so only make incremental adjustments.connectors
(list) - List of dicts specifying configuration for particular 3rd party connection. For guidelines in context of supported 3rd party application, see connectors documentation.masteringPreset
(string) - The mastering preset to use, this enables features such as sidechain compression 'i.e. ducking' See apiaudio.Mastering.list_presets()
for a list of presets and their descriptions.share
(boolean) - If you would like to have a sharable link created with your audio file, use this flag. If you put share: True
the response will have shareUrl
parameter returned. (Note: If you put this flag, your private files will be converted to public files.)
-
Example:
response = apiaudio.Mastering.create(
scriptId="id-1234",
soundTemplate="jakarta",
audience={"username":"salih", "location":"barcelona"}
)
-
create_media_timeline()
Creates a mastering request based purely on uploaded media files. Media files will need to be uploaded before calling this function. See media.
-
Parameters:
-
timeline
* [Required] (list) - The timeline object. A timeline object is a list of dictionaries, whereby each represents a track of audio files. Each track must have the following two keys files
and contentType
.
files
* [Required] (list) Files is a list of dictionaries, whereby each entry must have the following 3 keys, mediaId
, startAt
and endAt
files
* [Required] (str) the mediaId of the file that has been uploaded.startAt
* [Required] (float) the time that this media file should start at in seconds.endAt
* [Required] (float) the time that this media file should end at in seconds.
contentType
* [Required] (string) - The type of content that this track contains, should be either sound
or speech
-
endFormat
(list) - List of audio formats to be produced. Valid formats are: ["wav", "mp3" (default), "flac", "ogg", "mp3_very_low", "mp3_low", "mp3_medium", "mp3_high", "mp3_very_high", "mp3_alexa"]
-
masteringPreset
(string) - The mastering preset to use, this enables features such as sidechain compression 'i.e. ducking' See apiaudio.Mastering.list_presets()
for a list of presets and their descriptions.
-
Example:
backgroundId = apiaudio.Media.upload(file_path="background.wav")["mediaId"]
speechId = apiaudio.Media.upload(file_path="speech1.wav")["mediaId"]
timeline = [
{
"files" : [
{
"mediaId" : speechId,
"startAt" : 2,
"endAt" : 14,
}
],
"contentType" : "speech"
},
{
"files" : [
{
"mediaId" : backgroundId,
"startAt" : 0,
"endAt" : 45,
}
],
"contentType" : "sound"
}
]
response = apiaudio.Mastering.create_media_timeline(timeline=timeline, masteringPreset="lightducking")
-
retrieve()
Retrieves the mastered file urls.
-
download()
Download the mastered files in your preferred folder.
-
Parameters:
scriptId
* [Required] (string) - The script resource ID.version
(string) - The version of the script to be downloaded. Default is "".parameters
(dict) - Dictionary containing the audience item you want to retrieve. If parameters are not provided, the fallback track will be downloaded.destination
(string) - The folder destination path. Default is "." (current folder)public
(boolean) - Boolean flag that allows to retrieve the mastered file from the public bucket. Use this if you want to retrieve a mastered file created using public=True
. Default value is False
.vast
(boolean) - Boolean flag that allows to retrieve the VAST file of your mastered file. The vast
flag only works if public
is True
. Default value is False
.
-
Example:
mastered_files = apiaudio.Mastering.download(
scriptId="id-1234",
parameters={"username":"salih", "location":"barcelona"}
destination="."
)
-
list_presets()
List the available mastering presets.
-
Example:
presets = apiaudio.Mastering.list_presets()
print(presets)
Media
resource
Media allows you to retrieve all the files available in api.audio for your organization.
Media methods are:
-
upload()
- Upload files to our databases.
-
list()
- List all files within an org.
-
list_tags()
This returns a list with all unique user defined tags.
-
Example:
tags = apiaudio.Media.list_tags()
print(tags)
-
get_download_url()
- This method returns a presigned url for downloading a specific audio file
- params:
mediaId
* [required] (string): media id for the file to be downloaded
-
Example
url = apiaudio.Media.get_download_url(mediaId="some-mediaId")
print(url)
-
download()
- This method downloads a specific audio file
- params:
mediaId
* [required] (string): media id for the file to be downloadeddestination
(string): path to the directory where the file will be downloaded. Default is "."
-
Example
apiaudio.Media.download(
mediaId="some_mediaId",
destination="/my_destination_folder"
)
SyncTTS
resource
SyncTTS allows you to do Synchronous Text-To-Speech (TTS) with our API using all the voices available. Use it to create a speech audio file from a text and a voice name. The response contains wave bytes ready to be played or written to a file.
SyncTTS methods are:
Birdcache
resource
Birdcache is a caching service provided by API.audio that provides the caching layer for the customer by storing data in API.audio servers for future use. This allows you to retrieve your speech files on the fly.
Birdcache methods are:
Pronunciation Dictionary
resource
Often when working with TTS, the models can fail to accurately pronounce specific words, for example brands, names and locations are commonly mis-pronounced. As a first attempt to fix this we have introduced our lexi flag, which works in a similar way to SSML. For example, adding instead of Peadar (who is one of our founders) to your script will cause the model to produce an alternative pronunciation of this name. This is particularly useful in cases where words can have multiple pronunciations, for example the cities ‘reading’ and ‘nice’. In this instance placing and will ensure that these are pronounced correctly, given the script:
" The city of <!nice> is a really nice place in the south of france."
If this solution does not work for you, you can instead make use of our custom (self-serve) lexi feature.
This can be used to achieve one of two things, correcting single words, or expanding acronyms. For example, you can replace all occurrences of the word Aflorithmic with “af low rhythmic” or occurrences of the word ‘BMW’ with “Bayerische Motoren Werke”. Replacement words can be supplied as plain text or an IPA phonemisation.
Prononciation dictionary methods are:
-
list()
Lists the publicly available dictionaries and their words
-
list_custom_dicts()
Lists the custom dictionaries and their respective words
-
register_custom_word
Adds a new word to a custom dictionary.
For each language, only a single word entry is permitted. However, each word can have multiple specializations
. When a word is first registered a default
specialization
is always created, which will match what is passed in. Subsequent calls with different specializations will only update the given specialization. The exact repacement that will be used is determined by the following order of preference:
voice name > language dialect > provider name > default
For example, a replacement specified for voice name sara
will be picked over a replacement specified for provider azure
.
-
list_custom_words()
Lists all the words contained in a custom dictionary.
-
Parameters:
-
lang
[required] (string) - Language family, e.g. en
or es
- use global
to list language agnostic words.
-
Example:
words = apiaudio.Lexi.list_custom_words(lang="en")
Preview
The effect of applying the Pronunciation Dictionary can be seen with the script.preview()
method. See Script documentation for more details.
-
Example:
text = """
The author of this repo has lived in two places in the
UK, <!Bude> and <!Bristol>
"""
r = apiaudio.Script.create(scriptText=text)
scriptId = r["scriptId"]
preview = apiaudio.Script.preview(scriptId=scriptId, voice="Joanna")
print(preview)
-
Response:
{"preview" : "The author of this repo has lived in two places in the UK, bude and <phoneme alphabet=\"ipa\" ph=\"###\"> bristol </phoneme>"}
In this example Bristol
will be phonemised to ensure it is correctly pronouced, but as Bude
is not in our a dictionaires it is left as is. The exact IPA tokens for words in our internal dictionaires are obsfucated.
Connector
resource
Resource used for monitoring 3rd paty integrations. End results of Mastering resource can be distributed into external applications through connectors
field. See connectors documentation.
List of currently supported applications:
Available methods:
-
retrieve()
After registering a connector in the api.console, use this method to check whether a connection was succesful using provided credentials.
-
connection()
Check the status of the connection by providing connectionId
returned in a Mastering response.
Orchestrator
resource
The orchestrator is used to make working with a range of audio services as easy as sending a single API request. Each route here is carefully configured to produce high-quality and easy to access audio assets.
Orchestrator methods are:
-
create_audio()
Creates a simple TTS speech request and adds a sound template to it through mastering.
-
Parameters:
scriptText
* [Required] (str) - Text to synthesize (TTS).soundTemplate
(str) - Sound template to use.voice
* [Required] (str) - Name of voice to use.
-
create_three_sections()
Creates a TTS speech request with 3 sections and adds a sound template to it through mastering.
-
Parameters:
introText
* [Required] (str) - Text to synthesize in the intro section.mainText
* [Required] (str) - Text to synthesize in the main section.outroText
* [Required] (str) - Text to synthesize in the outro section.soundTemplate
(str) - Sound template to use.voice
* [Required] (str) - Name of voice to use.
-
media_with_sound()
Combines a pre-existing media file (i.e. pre-recorded voice) with a sound template
-
Parameters:
mediaId
* [Required] (str) - MediaId of the media file to use as input.soundTemplate
* [Required] (str) - Sound template to use.
Webhooks
This SDK provides an easy way of verifying apiaudio webhook call security headers. It is highly recommended for you to verify the headers in order to protect your server from any malicious attack.
The method is:
apiaudio.Webhooks.verify(payload, sig_header, secret, tolerance)
It will return true if the header is valid, otherwise it will raise an error.
The parameters to pass are; payload
being the body object sent by apiaudio, sig_header
being X-Aflr-Secret
in the request headers sent by apiaudio, secret
being your webhook secret (you can get it in apiaudio console) and tolerance
being the tolerance in seconds for the header checks, which defaults to 300 seconds.
Logging
By default, warnings issued by the API are logged in the console output. Additionally, some behaviors are logged on the informational level (e.g. "In progress..." indicators during longer processing times).
The level of logging can be controlled by choosing from the standard levels in Python's logging
library.
Maintainers
Development
There is a pre-commit hook that will run before you commit a file. This is to keep the code standards high. To enable it, you should run make
. Then it will set up the pre-commit hook for git. Thats all! Now every time before you commit, it will run to tell you about the standards.
If you use VSCode for committing files, you may bump into pre-commit command not found
error. That is ok, just run brew install pre-commit
or your fave package manager from the list here.
If you bump into your pip version is old
error, just ignore it and use the terminal.
If there is a problem and you are in a rush, you can add --no-verify
at the end of the commit command, it will skip the pre-commit hooks, e.g git commit -m 'your commit message' --no-verify
License
This project is licensed under the terms of the MIT license.