tts-react
provides a hook (useTts
) and component (TextToSpeech
) to convert text to speech. In most cases you want the hook so you can use custom styling on the audio controls.
By default tts-react
uses the SpeechSynthesis
and SpeechSynthesisUtterance
API's. You can fallback to the HTMLAudioElement
API by providing a fetchAudioData
prop to the hook or component.
Install
npm i react react-dom tts-react
Demo (Storybook)
morganney.github.io/tts-react
Example
Hook
You can use the hook to create a Speak
component that converts the text to speech on render:
import { useTts } from 'tts-react'
import type { TTSHookProps } from 'tts-react'
type SpeakProps = Pick<TTSHookProps, 'children'>
const Speak = ({ children }: SpeakProps) => (
<>{useTts({ children, autoPlay: true }).ttsChildren}</>
)
const App = () => {
return (
<Speak>
<p>This text will be spoken on render.</p>
</Speak>
)
}
Or create a more advanced component with controls for adjusting the speaking:
import { useTts } from 'tts-react'
import type { TTSHookProps } from 'tts-react'
interface CustomProps extends TTSHookProps {
highlight?: boolean
}
const CustomTTSComponent = ({ children, highlight = false }: CustomProps) => {
const { ttsChildren, state, onPlay, onStop, onPause } = useTts({
children,
markTextAsSpoken: highlight
})
return (
<div>
<>
<button disabled={state.isPlaying} onClick={onPlay}>Play</button>
<button onClick={onPause}>Pause</button>
<button onClick={onStop}>Stop</button>
</>
{ttsChildren}
</div>
)
}
const App = () => {
return (
<CustomTTSComponent highlight>
<p>Some text to be spoken and highlighted.</p>
</CustomTTSComponent>
)
}
Component
Use the TextToSpeech
component to get up and running quickly:
import { TextToSpeech, Positions, Sizes } from 'tts-react'
const App = () => {
return (
<TextToSpeech
markTextAsSpoken
align="vertical"
size={Sizes.SMALL}
position={Positions.TL}>
<p>Some text to be spoken.</p>
</TextToSpeech>
)
}
useTts
The hook returns the internal state of the audio being spoken, getters/setters of audio attributes, callbacks that can be used to control playing/stopping/pausing/etc. of the audio, and modified children
if using markTextAsSpoken
. The parameters accepted are described in the Props section. The response object is described by the TTSHookResponse
type.
const useTts = ({
lang,
voice,
children,
markColor,
markBackgroundColor,
onError,
onVolumeChange,
onPitchChange,
onRateChange,
fetchAudioData,
autoPlay = false,
markTextAsSpoken = false
}: TTSHookProps): TTSHookResponse => {
return {
get,
set,
state,
spokenText,
ttsChildren,
onPlay,
onStop,
onPause,
onReset,
onPlayPause,
onToggleMute
}
}
interface TTSHookResponse {
set: {
lang: (value: string) => void
rate: (value: number) => void
pitch: (value: number) => void
volume: (value: number) => void
preservesPitch: (value: boolean) => void
}
get: {
lang: () => string
rate: () => number
pitch: () => number
volume: () => number
preservesPitch: () => boolean
}
state: TTSHookState
spokenText: string
onPlay: () => void
onStop: () => void
onPause: () => void
onReset: () => void
onToggleMute: (callback?: (wasMuted: boolean) => void) => void
onPlayStop: () => void
onPlayPause: () => void
ttsChildren: ReactNode
}
interface TTSHookState {
voices: SpeechSynthesisVoice[]
boundary: BoundaryUpdate
isPlaying: boolean
isPaused: boolean
isMuted: boolean
isError: boolean
isReady: boolean
}
interface TTSBoundaryUpdate {
word: string
startChar: number
endChar: number
}
fetchAudioData
Using fetchAudioData
will bypass SpeechSynthesis
and use the HTMLAudioElement
.
(spokenText: string) => Promise<TTSAudioData>
When using fetchAudioData
it must return TTSAudioData
which has the following shape:
interface PollySpeechMark {
end: number
start: number
time: number
type: 'word'
value: string
}
interface TTSAudioData {
audio: string
marks?: PollySpeechMark[]
}
The audio
property must be a URL that can be applied to HTMLAudioElement.src
, including a data URL. If using markTextAsSpoken
then you must also return the marks
that describe the word boundaries. PollySpeechMarks
have the same shape as the Speech Marks used by Amazon Polly, with the restriction that they must be of type: 'word'
.
Props
Most of these are supported by the useTts
hook, but those marked with an asterisk are exclusive to the TextToSpeech
component.
*
Only applies to TextToSpeech
component.
Name | Required | Type | Default | Description |
---|
children | yes | ReactNode | none | Provides the text that will be spoken. |
lang | no | string | The one used by SpeechSynthesisUtterance.lang . | Sets the SpeechSynthesisUtterance.lang . Overrides voice when set and voice.lang does not match lang . |
voice | no | SpeechSynthesisVoice | None or the voice provided by audio from TTSAudioData . | The voice heard when the text is spoken. Calling set.lang may override this value. |
autoPlay | no | boolean | false | Whether the audio of the text should automatically be spoken when ready. |
markTextAsSpoken | no | boolean | false | Whether the word being spoken should be highlighted. |
markColor | no | string | none | Color of the text that is currently being spoken. Only applies with markTextAsSpoken . |
markBackgroundColor | no | string | none | Background color of the text that is currently being spoken. Only applies with markTextAsSpoken . |
fetchAudioData | no | (text: string) => Promise<TTSAudioData> | none | Function to return the optional SpeechMarks[] and audio URL for the text to be spoken. See fetchAudioData for more details. |
* allowMuting | no | boolean | true | Whether an additional button will be shown on the component that allows muting the audio. |
* onMuteToggled | no | (wasMuted: boolean) => void | none | Callback when the user clicks the mute button shown from allowMuting being enabled. Can be used to toggle global or local state like whether autoPlay should be enabled. |
onError | no | (evt: CustomEvent<string>) => void | none | Callback when there is an error of any kind playing the spoken text. The error message (if any) will be provided in evt.detail . |
* align | no | 'horizontal' | 'vertical' | 'horizontal' | How to align the controls within the TextToSpeech component. |
* size | no | 'small' | 'medium' | 'large' | 'medium' | The relative size of the controls within the TextToSpeech component. |
* position | no | 'topRight' | 'topLeft' | 'bottomRight' | 'bottomLeft' | 'topRight' | The relative positioning of the controls within the TextToSpeech component. |
* useStopOverPause | no | boolean | false | Whether the controls should display a stop button instead of a pause button. On Android devices, SpeechSynthesis.pause() behaves like cancel() , so you can use this prop in that context. |
FAQ
Why does markTextAsSpoken
sometimes highlight the wrong word?
The SpeechSynthesisUtterance
boundary event may fire with skewed word boundaries for certain combinations of spokenText
and lang
or voice
props. If you check the value of state.boundary.word
in these cases, you will find the event is firing at unexpected boundaries, so there is no real solution other than to find a suitable voice
for your given spokenText
.
Why does markTextAsSpoken
not work on Chrome for Android?
This is a known issue by the Chromium team that apparently they are not going to fix. You can use fetchAudioData
to fallback to the HTMLAudioElement
, or try a different browser.
Why can I not pause the audio when using SpeechSynthesis
on Firefox and Chrome for Android?
See the compat table on MDN for SpeechSynthesis.pause().
In Android, pause() ends the current utterance. pause() behaves the same as cancel().
You can use the hook useTts
to build custom controls that do not expose a pause, but only stop. If using the TextToSpeech
component use the useStopOverPause
prop for Android devices.
Why is text from dangerouslySetInnerHTML
not spoken?
tts-react
does not speak text from dangerouslySetInnerHTML
. Instead convert your HTML string into React elements via an html-to-react parser. See this example.
What's up with Safari?
Safari simply does not follow the spec completely (yet). As one example, Safari 15.6.1 on macOS Monterey 12.5.1, throws a SpeechSynthesisEvent
during a SpeechSynthesisUtterance.error
, while the spec says errors against utterances "must use the SpeechSynthesisErrorEvent interface".