You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

github.com/Alliera/speech-to-text

Package Overview

Dependencies

Alerts

File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/Alliera/speech-to-text

v0.0.0-20240627131143-19771de53b09

Source

Version published: last year

Created: 2 years ago

Source

Speech to text Service

This service implement REST API for recognition audio files using google cloud speech recognition

Installation

First you need to have Google Cloud account
Create credentials.json file according to documentation
Enable speech-to-text service (press "Go to console" button and enable Cloud Speech-to-Text API)
Create google bucket using documentation
Run export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json, where /path/to/credentials.json is path to json file from p.2
Run export BUCKET_NAME=you_bucket, where you_bucket is bucket name from to json file from p.4
Run docker-compose build
Run docker-compose up -d. For watching the application logs use docker-compose logs -f command

Google credentials

To run the application, a prerequisite is the presence of two environment variables. GOOGLE_APPLICATION_CREDENTIALS - The path to the main credentials file, which will be used for all enterpises, for which they are not specified in a separate config (see below about it).
BUCKET_NAME - Also a required variable and the name of the bucket, which will be used for the name of the bucket, in case it is not configured separately for enterprises.

Add credentials for specific enterprises

there are 2 ways to add credentials.

Using config
Using API (which adds these configs to the config in p.1)

Using config

config/buckets.json should contain the configuration for buckets, for example:

{
  "3": "bucket-name-for-enterprise-3",
  "4": "bucket-name-for-enterprise-4"
}

config/credentials/<enterprise_id>.json should contain credentials files for example config/credentials/3.json - is credentials for enterprise with id 3

Using API

POST localhost:7070/getTexts

Body Example:

{
   "credentials":{
      "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
      "auth_uri":"https://accounts.google.com/o/oauth2/auth",
      "client_email":"speech-to-text@seraphic-vertex-234234.iam.gserviceaccount.com",
      "client_id":"12341234123412341234",
      "client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/speech-to-text%40seraphic-vertex-234234.iam.gserviceaccount.com",
      "private_key":"-----BEGIN PRIVATE KEY-----\nSomePrivateKey...\n-----END PRIVATE KEY-----\n",
      "private_key_id":"SomePrivateKey",
      "project_id":"seraphic-vertex-234234",
      "token_uri":"https://oauth2.googleapis.com/token",
      "type":"service_account"
      }, 
   "bucketName":"someBucketName", 
   "enterpriseId": 1
}

This endpoint checks for the presence of a bucket and the validity of the credentials. If something is invalid, the server will return a 409 error code. In case successful addition, 200 code is returned.

Speech to text Usage

REST API has only one endpoint for speech recognition:

POST localhost:7070/getTexts

Body Example:

[
  {
    "uuid": "A23D3",
    "fileUrl": "https://some-site.com/some_audio.wav",
    "language": "en"
  },
  {
    "fileUrl": "https://some-site.com/some_audio2.wav",
    "language": "en"
  }
]

uuid is optional.

fileUrl and language is required. Supported languages: en, it, de, fr, nl, es, ca, gl, pt, pl, ro, el, da, eu, ru, bg, sl, sr, hr.

Response Example:

[
  {
    "uuid": "A23D3",
    "fileUrl": "https://some-site.com/some_audio.wav",
    "text": "Good morning, and welcome to WWDC. WDC is incredibly important and our users..",
    "duration": 15,
    "language": "en"
  },
  {
    "uuid": "",
    "fileUrl": "https://some-site.com/some_audio2.wav",
    "text": "It's sure that we bring some of our biggest. I have a chance to live and we have not stopped, <.....>",
    "duration": 45,
    "language": "en"
  }
]

text is recogtized text

duration is duration of vaw file which is a multiple of 15, according to Google tariffication

FAQs

What is github.com/Alliera/speech-to-text?

Package last updated on 27 Jun 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/Alliera/speech-to-text

Speech to text Service

Installation

Google credentials

Add credentials for specific enterprises

Using config

Using API

Body Example:

Speech to text Usage

Body Example:

Response Example:

Related posts

Introducing License Overlays: Smarter License Management for Real-World Code

Introducing Rust Support in Socket