🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Sign inDemoInstall
Socket

github.com/giulianopz/go-gsst

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/giulianopz/go-gsst

v0.1.0
Source
Go
Version published
Created
Source

gstt

A Go client to call the Google Speech API for free.

The Google Speech API (full duplex version) are meant to offer a speech recognition service via the Web Speech API on the Google Chrome browser. They are different from the Google Cloud Speech-to-Text API.

Disclaimer: The Google Speech API is an internal API and totally unsupported, susceptible to change or disappear at any moment in the future.

Usage

Import it as a package:


import (
    "github.com/giulianopz/go-gstt/pkg/client"
    "github.com/giulianopz/go-gstt/pkg/transcription"
)

func main() {
	var (
		httpC   = client.New()
		in      io.Reader                            // audio input
		options *opts.Options                        // configure transcription parameters
		out     = make(chan *transcription.Response) // receive results from channel
	)

	go httpC.Transcribe(in, out, options)

	for resp := range out {
		for _, result := range resp.Result {
			for _, alt := range result.Alternative {
				fmt.Printf("confidence=%f, transcript=%s\n", alt.Confidence, strings.TrimSpace(alt.Transcript))
			}
		}
	}
}

Use it as a command:

$ git clone https://github.com/giulianopz/go-gstt
$ cd go-gstt
$ go build -o gstt .
$ mv gstt /usr/local/bin
# or just `go install github.com/giulianopz/go-gstt@latest`, if you don't want to rename the binary
$ gstt -h
Usage:
    gstt [OPTION]... --interim --continuous [--file FILE]

Options:
        --verbose
        --file, path of audio file to trascript
        --key, API key to authenticates request (default is the one built into any Chrome installation)
        --language, language of the recording transcription, use the standard webcodes for your language, i.e. 'en-US' for English-US, 'ru' for Russian, etc. please, see https://en.wikipedia.org/wiki/IETF_language_tag
        --continuous, to keep the stream open and transcoding as long as there is no silence
        --interim, to send back results before its finished, so you get a live stream of possible transcriptions as it processes the audio
        --max-alts, how many possible transcriptions do you want
        --pfilter, profanity filter ('0'=off, '1'=medium, '2'=strict)
        --user-agent, user-agent for spoofing
        --sample-rate, audio sampling rate
# trascribe audio from a single FLAC file
$ gstt --interim --continuous --file $FILE
# trascribe audio from microphone input (recorded with sox, removing silence)
$ rec -c 1 --encoding signed-integer --bits 16 --rate 16000 -t flac - silence 1 0.1 1% -1 0.5 1% | gstt --interim --continuous

Demo

Live-caption speech redirecting speakers output to microphone input with PulseAudio Volume Control (pavucontrol):

livecapdemo

(how-to-gif)

Credits

As far as I know, this API has been going around since a long time.

Mike Pultz was possibly the first one to discover it in 2011. Subsequently, Travis Payton published a detailed report on the subject.

I wrote about it on my blog.

FAQs

Package last updated on 05 Nov 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts