Socket
Socket
Sign inDemoInstall

speech-to-element

Package Overview
Dependencies
0
Maintainers
1
Versions
66
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    speech-to-element

Add real-time speech to text functionality into your website with no effort


Version published
Weekly downloads
1.4K
decreased by-17.33%
Maintainers
1
Install size
115 kB
Created
Weekly downloads
 

Readme

Source
Logo

Speech To Element is an all purpose npm library that can transcribe speech into text right out of the box! Try it out in the official website.

:zap: Services

https://github.com/OvidijusParsiunas/speech-to-element/assets/18709577/e2e618f8-b61c-4877-804b-26eeefbb0afa

:computer: How to use

NPM:

npm install speech-to-element
import SpeechToElement from 'speech-to-element';

const targetElement = document.getElementById('target-element');
SpeechToElement.toggle('webspeech', {element: targetElement});

CDN:

<script type="module" src="https://cdn.jsdelivr.net/gh/ovidijusparsiunas/speech-to-element@master/component/bundle/index.min.js"></script>
const targetElement = document.getElementById('target-element');
window.SpeechToElement.toggle('webspeech', {element: targetElement});

When using Azure, you will also need to install its speech SDK. Read more in the Azure SDK section.
Make sure to checkout the examples directory to browse templates for React, Next.js and more.

:construction_worker: Local setup

# Install node dependencies:
$ npm install

# Serve the component locally (from index.html):
$ npm run start

# Build the component into a module (dist/index.js):
$ npm run build:module

:beginner: API

Methods

Used to control Speech To Element transcription:

NameDescription
startWebSpeech({Options & WebSpeechOptions})Start Web Speech API
startAzure({Options & AzureOptions})Start Azure API
toggle("webspeech", {Options & WebSpeechOptions})Start/Stop Web Speech API
toggle("azure", {Options & AzureOptions})Start/Stop Azure API
stop()Stops all speech services
endCommandMode()Ends the command mode

Examples:

SpeechToElement.startWebSpeech({element: targetElement, displayInterimResults: false});
SpeechToElement.startAzure({element: targetElement, region: 'westus', token: 'token'});
SpeechToElement.toggle('webspeech', {element: targetElement, language: 'en-US'});
SpeechToElement.toggle('azure', {element: targetElement, region: 'eastus', subscriptionKey: 'key'});
SpeechToElement.stop();
SpeechToElement.endCommandMode();
Object Types
Options:

Generic options for the speech to element functionality:

NameTypeDescription
elementElement | Element[]Transcription target element. By defining multiple inside an array the user can switch between them in the same session by clicking on them.
autoScrollbooleanControls if element will automatically scroll to the new text.
displayInterimResultsbooleanControls if interim result are displayed.
textColorTextColorObject defining the result text colors.
translations{[key: string]: string}Case-sensitive one-to-one map of words that will automatically be translated to others.
commandsCommandsSet the phrases that will trigger various chat functionality.
onStart() => voidTriggered when speech recording has started.
onStop() => voidTriggered when speech recording has stopped.
onResult( text: string, isFinal: boolean ) => voidTriggered when a new result is transcribed and inserted into element.
onPreResult( text: string, isFinal: boolean ) => PreResult | voidTriggered before result text insertion. This function can be used to control the speech service based on what was spoken via the PreResult object.
onCommandMode
Trigger
(isStart: boolean) => voidTriggered when command mode is initiated and stopped.
onPauseTrigger(isStart: boolean) => voidTriggered when the pause command is initiated and stopped via resume command.
onError(message: string) => voidTriggered when an error has occurred.

Examples:

SpeechToElement.toggle('webspeech', {element: targetElement, translations: {hi: 'bye', Hi: 'Bye'}});
SpeechToElement.toggle('webspeech', {onResult: (text) => console.log(text)});
TextColor:

Object used to set the color for transcription result text (does not work for input and textarea elements):

NameTypeDescription
interimstringTemporary text color
finalstringFinal text color

Example:

SpeechToElement.toggle('webspeech', {
  element: targetElement, textColor: {interim: 'grey', final: 'black'}
});
Commands:

https://github.com/OvidijusParsiunas/speech-to-element/assets/18709577/cca6bc40-ceb7-4d48-92e4-31c5f66366eb

Object used to set the phrases of commands that will control transcription and input functionality:

NameTypeDescription
stopstringStop the speech service
pausestringTemporarily stops the transcription and re-enables it after the phrase for resume is spoken.
resumestringRe-enables transcription after it has been stopped by the pause or commandMode commands.
resetstringRemove the transcribed text (since the last element cursor move)
removeAllTextstringRemove all element text
commandModestringActivate the command mode which will stops transcription and waits for a command to be executed. Use the phrase for resume to leave the command mode.
settingsCommandSettingsControls how command mode is used.

Example:

SpeechToElement.toggle('webspeech', {
  element: targetElement,
  commands: {
    pause: 'pause',
    resume: 'resume',
    removeAllText: 'remove text',
    commandMode: 'command'
  }
});
CommandSettings:

Object used to configure how the command phrases are interpreted:

NameTypeDescription
substringsbooleanToggles whether command phrases can be part of spoken words or if they are whole words. E.g. when this is set to true and your command phrase is "stop" - when you say "stopping" the command will be executed. However if it is set to false - the command will only be executed if you say "stop".
caseSensitivebooleanToggles if command phrases are case sensitive. E.g. if this is set to true and your command phrase is "stop" - when the service recognizes your speech as "Stop" it will not execute your command. On the other hand if it is set to false it will execute.

Example:

SpeechToElement.toggle('webspeech', {
  element: targetElement,
  commands: {
    removeAllText: 'remove text',
    settings: {
      substrings: true,
      caseSensitive: false
  }}
});
PreResult:

Result object for the onPreResult function. This can be used to control the speech service and facilitate custom commands for your application:

NameTypeDescription
stopbooleanStops the speech service
restartbooleanRestarts the speech service
removeNewTextbooleanToggles whether the newly spoken (interim) text is removed when either of the above properties are set to true.

Example for a creating a custom command:

SpeechToElement.toggle('webspeech', {
  element: targetElement,
  onPreResult: (text) => {
    if (text.toLowerCase().includes('custom command')) {
      SpeechToElement.endCommandMode();
      your custom code here
      return {restart: true, removeNewText: true};
  }}
});
WebSpeechOptions:

Custom options for the Web Speech API:

NameTypeDescription
languagestringThis is the recognition language. See the following QA for the full list.

Example:

SpeechToElement.toggle('webspeech', {element: targetElement, language: 'en-GB'});
AzureOptions:

Options for the Azure Cognitive Speech Services API. This object REQUIRES region and either retrieveToken or subscriptionKey or token properties to be defined with it:

NameTypeDescription
regionstringLocation/region of your Azure speech resource.
retrieveToken() => Promise<string>Function used to retrieve a new token for your Azure speech resource. It is the recommended property to use as it can retrieve the token from a secure server that will hide your credentials. Check out the starter server templates to start a local server in seconds.
subscriptionKeystringSubscription key for your Azure speech resource.
tokenstringTemporary token for the Azure speech resource.
languagestringBCP-47 string value to denote the recognition language. You can find the full list here.
stopAfterSilenceMsnumberMilliseconds of silence required for the speech service to automatically stop. Default is 25000ms (25 seconds).

Examples:

SpeechToElement.toggle('azure', {
  element: targetElement,
  region: 'eastus',
  token: 'token',
  language: 'ja-JP'
});

SpeechToElement.toggle('azure', {
  element: targetElement,
  region: 'southeastasia',
  retrieveToken: async () => {
    return fetch('http://localhost:8080/token')
      .then((res) => res.text())
      .then((token) => token)
      .catch((error) => console.error('error'));
  }
});

Example server templates for the retrieveToken property:

ExpressNestFlaskSpringGoNext

Location of subscriptionKey and region details in Azure Portal:

Credentials location in Azure Portal

:floppy_disk: Azure SDK

To use the Azure Cognitive Speech Services API, you will need to add the official Azure Speech SDK into your project and assign it to the window.SpeechSDK variable. Here are some simple ways you can achieve this:

  • Import from a dependancy: If you are using a dependancy manager, import and assign it to window.SpeechSDK:

    import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
    window.SpeechSDK = sdk;
    
  • Dynamic import from a dependancy If you are using a dependancy manager, dynamically import and assign it to window.SpeechSDK:

    import('microsoft-cognitiveservices-speech-sdk').then((module) => {
       window.SpeechSDK = module;
    });
    
  • Script from a CDN You can add a script tag to your markup or create one via javascript. The window.SpeechSDK property will be populated automatically:

    <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js"></script>
    
    const script = document.createElement("script");
    script.src = "https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js";
    document.body.appendChild(script);
    

If your project is using TypeScript, add this to the file where the module is used:

import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
declare global {
  interface Window {
    SpeechSDK: typeof sdk;
  }
}

Examples:

Example React project that uses a package bundler. It should work similarly for other UI frameworks:

Click for Live Example

VanillaJS approach with no bundler (this can also be used as fallback if above doesn't work):

Click for Live Example

:star: Example Product

Deep Chat - an AI oriented chat component that is using Speech To Element to power its Speech To Text capabilities.

:heart: Contributions

Open source is built by the community for the community. All contributions to this project are welcome!
Additionally, if you have any suggestions for enhancements, ideas on how to take the project further or have discovered a bug, do not hesitate to create a new issue ticket and we will look into it as soon as possible!

Keywords

FAQs

Last updated on 28 Aug 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc