You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

ssml-split

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

ssml-split

Splits long texts with SSML tags by batches suitable for working with AWS Polly TTS and Google Cloud Text to Speech.

0.3.4

Source

npm

Version published: 6 years ago

Maintainers: 1

Created: 6 years ago

Source

SSML Split

Splits SSML strings into batches AWS Polly ánd Google's Text to Speech API can consume.

Features

Splits your large SSML into batches AWS Polly and Google's Text to Speech API can consume.
Makes sure you stay below the API character limitations by configuring a hardLimit.
Creates the least possible batch size to limit your requests to the Text to Speech API's.
Will split text at the nearest ., ,, ; or space. Can be configured.
Uses TypeScript so you can enjoy the type safety and documentation that comes with it.

Based on polly-ssml-split by @oleglegun

Documentation

Installation - Walk through how to install SSML Split.
Usage - Read how SSML Split works with the available options.
Recommended Options - Use these options to get started quickly.
Contributing - Become familiar with how to contribute back to SSML Split
Code of Conduct - Be a good citizen by following these repository rules

Installation

Install the package with:

npm install ssml-split --save

Usage

Import the package and set the options. Use the .split() method to split your SSML string. You can tweak the softLimit to see what works for you. I suggest you keep the hardLimit at the limitation limit of the respective API:

import SSMLSplit from 'ssml-split';

const ssmlSplit = new SSMLSplit({
  softLimit: 4000, // Finds a possible split moment starting from 4000 characters
  hardLimit: 5000, // Google Text to Speech limitation
  includeSSMLTagsInCounter: true, // Set true when using Google Text to Speech API, set to false with AWS Polly
  breakParagraphsAboveHardLimit: true // Allow to split large paragraphs, set to false to keep your <p></p> intact
});

const batches = ssmlSplit.split('<speak>your long ssml here</speak>');

Option	Type	Default	Description
`softLimit`	`number`	`1500`	The amount of characters the script will start trying to break-up your SSML in multiple parts. You can tweak this number to see what works for you.
`hardLimit`	`number`	`3000`	The amount of characters the script should stay below for maximum size per SSML part. If any batch size goes above this, the script will error.
`includeSSMLTagsInCounter`	`boolean`	`false`	Set to `true` to include the SSML tag characters in the calculation on when to split the SSML. This is recommended when you work with Google's Text to Speech API. Set to `false` to only include text characters in the calculation, which is recommended for AWS Polly.
`breakParagraphsAboveHardLimit`	`boolean`	`false`	Set to `true` to allow the script to break up large paragraphs by removing the `<p>` and replacing the `</p>` with a `<break strength="x-strong" />`, which results in the same pause.
`extraSplitChars`	`string`	`,;.`	Characters that can be used as split markers for plain text.

About: includeSSMLTagsInCounter

By adding the option includeSSMLTagsInCounter: true to include the SSML tag characters in the calculation on when to split the SSML, makes the library also work with Google's Text to Speech API.

For example: <speak>some text</speak>

The default behaviour would count that as 9 characters, which is fine for AWS Polly, but not for Google's Text to Speech API.

With includeSSMLTagsInCounter: true it will be count as 31 characters, just like Google's Text to Speech API counts it.

This should prevent you from seeing this error when using Google's Text to Speech API:

INVALID_ARGUMENT: 5000 characters limit exceeded.

About: breakParagraphsAboveHardLimit

By adding the option breakParagraphsAboveHardLimit: true you allow the script to break up large paragraphs by removing the  and replacing the  with a <break strength="x-strong" />, which results in the same pause. This allows the script to properly split large paragraphs.

If you work with large paragraphs and you do not use this option, you might run into errors like SSML tag appeared to be too long.

Recommended options

AWS

new SSMLSplit({
  softLimit: 2000,
  hardLimit: 3000, // AWS Polly limitation
  includeSSMLTagsInCounter: false, // Do not count SSML tags as characters
  breakParagraphsAboveHardLimit: true, // optional, but recommended when you have large <p>'s
})

Google

new SSMLSplit({
  softLimit: 4000,
  hardLimit: 5000, // Google Text to Speech API limitation
  includeSSMLTagsInCounter: true, // Count SSML tags as characters
  breakParagraphsAboveHardLimit: true, // optional, but recommended when you have large <p>'s
})

About

The polly-ssml-split by @oleglegun library already handles splitting of SSML correctly for AWS Polly, but wasn't working properly for Google's Text to Speech. So I just modified the package to fit my needs.

Changes compared to `polly-ssml-split`:

Added includeSSMLTagsInCounter option to count characters based on the complete SSML tag and not just the included text characters. Which is required if you work with Google's Text to Speech API.
Rewrote the library to use Typescript, so you get correct type checking in your Typescript project.
Removed the .configure method and use the class constructor method for it instead.
Added breakParagraphsAboveHardLimit options to break up large paragraphs by removing the  and replacing the  with a <break strength="x-strong" /> which results in the same pause. This allows the script to properly split the paragraph and to send less batches to the text to speech API's.
Added more tests using Jest.

Development

Any contribution is appreciated! Please read our CONTRIBUTING.md on how to contribute.

Use a test-driven approach when developing new features or fixing bugs.

Develop:

$ npm install
$ npm run dev

Run tests on file change:

$ npm test:watch

Run all tests:

$ npm test

Keywords

ssml

aws

google

google-text-to-speech

aws-polly

text-to-speech

FAQs

What is ssml-split?

Is ssml-split well maintained?

Package last updated on 06 Jan 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ssml-split

SSML Split

Features

Documentation

Installation

Usage

About: includeSSMLTagsInCounter

About: breakParagraphsAboveHardLimit

Recommended options

AWS

Google

About

Changes compared to polly-ssml-split:

Development

Keywords

Related posts

Tracking Protestware Spread: 28 npm Packages Affected by Payload Targeting Russian-Language Users

Contagious Interview Campaign Escalates With 67 Malicious npm Packages and New Malware Loader

Changes compared to `polly-ssml-split`: