🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

github.com/Alliera/speech-to-text

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/Alliera/speech-to-text

v0.0.0-20240627131143-19771de53b09
Source
Go
Version published
Created
Source

Speech to text Service

This service implement REST API for recognition audio files using google cloud speech recognition

Installation

  • First you need to have Google Cloud account
  • Create credentials.json file according to documentation
  • Enable speech-to-text service (press "Go to console" button and enable Cloud Speech-to-Text API)
  • Create google bucket using documentation
  • Run export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json, where /path/to/credentials.json is path to json file from p.2
  • Run export BUCKET_NAME=you_bucket, where you_bucket is bucket name from to json file from p.4
  • Run docker-compose build
  • Run docker-compose up -d. For watching the application logs use docker-compose logs -f command

Google credentials

To run the application, a prerequisite is the presence of two environment variables. GOOGLE_APPLICATION_CREDENTIALS - The path to the main credentials file, which will be used for all enterpises, for which they are not specified in a separate config (see below about it).
BUCKET_NAME - Also a required variable and the name of the bucket, which will be used for the name of the bucket, in case it is not configured separately for enterprises.

Add credentials for specific enterprises

there are 2 ways to add credentials.

  • Using config
  • Using API (which adds these configs to the config in p.1)

Using config

config/buckets.json should contain the configuration for buckets, for example:

{
  "3": "bucket-name-for-enterprise-3",
  "4": "bucket-name-for-enterprise-4"
}

config/credentials/<enterprise_id>.json should contain credentials files for example config/credentials/3.json - is credentials for enterprise with id 3

Using API

POST localhost:7070/getTexts

Body Example:

{
   "credentials":{
      "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
      "auth_uri":"https://accounts.google.com/o/oauth2/auth",
      "client_email":"speech-to-text@seraphic-vertex-234234.iam.gserviceaccount.com",
      "client_id":"12341234123412341234",
      "client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/speech-to-text%40seraphic-vertex-234234.iam.gserviceaccount.com",
      "private_key":"-----BEGIN PRIVATE KEY-----\nSomePrivateKey...\n-----END PRIVATE KEY-----\n",
      "private_key_id":"SomePrivateKey",
      "project_id":"seraphic-vertex-234234",
      "token_uri":"https://oauth2.googleapis.com/token",
      "type":"service_account"
      }, 
   "bucketName":"someBucketName", 
   "enterpriseId": 1
}

This endpoint checks for the presence of a bucket and the validity of the credentials. If something is invalid, the server will return a 409 error code. In case successful addition, 200 code is returned.

Speech to text Usage

REST API has only one endpoint for speech recognition:

POST localhost:7070/getTexts

Body Example:

[
  {
    "uuid": "A23D3",
    "fileUrl": "https://some-site.com/some_audio.wav",
    "language": "en"
  },
  {
    "fileUrl": "https://some-site.com/some_audio2.wav",
    "language": "en"
  }
]

uuid is optional.

fileUrl and language is required. Supported languages: en, it, de, fr, nl, es, ca, gl, pt, pl, ro, el, da, eu, ru, bg, sl, sr, hr.

Response Example:

[
  {
    "uuid": "A23D3",
    "fileUrl": "https://some-site.com/some_audio.wav",
    "text": "Good morning, and welcome to WWDC. WDC is incredibly important and our users..",
    "duration": 15,
    "language": "en"
  },
  {
    "uuid": "",
    "fileUrl": "https://some-site.com/some_audio2.wav",
    "text": "It's sure that we bring some of our biggest. I have a chance to live and we have not stopped, <.....>",
    "duration": 45,
    "language": "en"
  }
]

text is recogtized text

duration is duration of vaw file which is a multiple of 15, according to Google tariffication

FAQs

Package last updated on 27 Jun 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts