Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

@davi-ai/speechmarkdown-davi-js

Package Overview
Dependencies
Maintainers
2
Versions
6
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@davi-ai/speechmarkdown-davi-js

Speech Markdown parser and formatters in TypeScript.

  • 1.0.6
  • latest
  • npm
  • Socket score

Version published
Maintainers
2
Created
Source

TypeScript version Node.js version MIT

speechmarkdown-davi-js

Speech Markdown grammar, parser, and formatters for use with JavaScript.

Supported platforms:

  • microsoft-azure

Partial / no support:

  • amazon-alexa
  • amazon-polly
  • amazon-polly-neural
  • google-assistant
  • samsung-bixby

how to use

import { SpeechMarkdown } from '@davi-ai/speechmarkdown-davi-js'

const options = {
  platform: 'microsoft-azure',
  includeSpeakTag: false,
  globalVoiceAndLang: {
    voice: 'en-US-JennyMultiLingualNeural',
    lang: 'fr-FR'
  }
}

const speechMarkdownParser = new SpeechMarkdown(options)

You can use multiple options, the most useful ones are :

  • platform : 'microsoft-azure' to generate SSML for azure neural voices
  • includeSpeakTag : add or not a tag at the beginning and tag at the ending.
  • globalVoiceAndLang: { voice?: string, lang?: string } : added for microsoft voices and retorik-framework architecture. If you use a selected voice as main voice, put it in 'voice' field
    (format language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural)) When using a multilingual voice (ex: JennyMultilingualNeural, if the text has to be spoken in a different language than the one of this language, add
    the 'lang' field with the desired language, formatted language-CULTURE (ex: fr-FR, en-US, de-DE, ...)

With theses parameters, you will receive a complete SSML string, excepted for the tag that has to be put manually around. We don't use the includeSpeakTag = true
parameter because it only puts a tag, and to use Microsoft voices we need a complete tag as follows :

  <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="fr-FR">

available speechmarkdown tags

There are many different tags and most of them have restrictions. To get the current documentation, go to docs.microsoft.com

On 2023/07/28, the available tags are :

  • voice :
    • (text to be read with that voice)[voice:"voice name"]
    • the text can contain other tags except 'voice'
    • the voice name can be as follows :
      • language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural))
      • full Microsoft name (ex: Microsoft Server Speech Text to Speech Voice (en-US, JennyMultilingualNeural))
    • example : (Bonjour, comment ça va ?)[voice:"fr-FR-DeniseNeural"]
  • lang :
    • (text to be read in this language)[lang:"language name"]
    • the text can contain other tags except 'voice' and 'lang'
    • the lang name must be formatted as language-CULTURE (ex: fr-FR, en-US)
    • example : (Bonjour, comment ça va ?)[lang:"en-US"]
  • break :
    • [break time in seconds / milliseconds] or [break:"strength value"]
    • strength values :
      • none
      • x-weak
      • weak
      • medium
      • strong
      • x-strong
    • example : ts [break:"strong"] / [1s] / [250ms]
  • silence :
    • [silence:"type value"]
    • type and value are required
    • type can be :
      • Leading : beginning of text
      • Tailing : end of text
      • SentenceBoundary : between adjacent sentences
    • value is an integer giving time in seconds or milliseconds, lower than 5000ms
    • example : [silence:"Leading 1s"]
  • prosody :
    • (text for which the prosody will be adjusted)[pitch:"value";contour="value";range="value";rate="value";volume="value"]
    • you can use any of the modifiers below, from one to all of them
    • modifiers :
      • pitch
      • contour
      • range
      • rate
      • volume
    • example : (this will be spoken slow and high)[rate:"slow";pitch:"high"]
  • emphasis :
    • [emphasis:"value"] or ++text will be strong++
    • value can be / corresponding symbols around text :
      • reduced / -text reduced-
      • none / ~text without change
      • moderate / +text stronger+
      • strong / ++text much stronger++
    • example : [emphasis:"moderate"] / +bonjour+
  • say-as :
    • (text to be said as)[modifier]
    • modifier can be :
      • address
      • number
      • characters
      • fraction
      • ordinal
      • telephone
      • time
      • date
    • example : I need this answer (ASAP)[characters] / My phyone number is (0386300000)[telephone]
  • ipa :
    • the International Phonetic Alphabet (ipa) allows you to force the pronunciation of a word / sentence
    • example : I love (paintball)[ipa:"peɪntbɔːl"]
  • emotions :
    • [emotion:"style role/styledegree"]
    • the style is mandatory, and depends on the voice speaking at that time (ex: fr-FR-DeniseNeural can only use 'sad' and 'cheerful' while ja-JP-NanamiNeural can use
      'chat', cheerful' and 'customerservice')
    • role and styledegree are optionnal. Role is a string, while styledegree is a number. Note that 'role' is restricted to very few voices
    • example : (It's so cool ! We are going to a great park today !)[voice:"en-US-JennyNeural";emotion:"excited 2"]
  • audio :
  • backgroundaudio :
    • [backgroundaudio:"src volume fadein fadeout"]
    • src mandatory, other fields optionnal but all fields on the left must be provided before using one on the right (ex: to use fadein,
      you must have provided a value for src and volume)
    • only one backgroundaudio tag possible
    • example : [backgroundaudio:"https://cdn.retorik.ai/retorik-framework/audiofiles/audiotest.mp3 0.5 2000 1500"]
  • lexicon :
    • [lexicon:"url to the lexicon xml file"]
    • the lexicon file is restricted to one language (en-US, fr-FR, ...) so it won't be used if the voice uses another language
    • it does nothing when using a multilingual voice (ex: JennyMultilingualNeural), even if the lang tag of this voice is the same as the one in the lexicon file
    • the lexicon inputs are case-sensitive, for example 'hello' and 'Hello' must be treated separately
    • example : [lexicon:"https://cdn.retorik.ai/retorik-framework/lexicon-en-US.xml"] Hi everybody ! BTW how are you today ?
  • bookmark :
    • [bookmark:"bookmark text"]
    • example : Bookmark after city name : first Paris [bookmark:"city1"], then Berlin [bookmark:"city2"]

License

Licensed under the MIT. See the LICENSE file for details.

FAQs

Package last updated on 01 Mar 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc