New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

py-transcribe

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

py-transcribe

framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.

1.5.0
PyPI

Maintainers: 1

py-transcribe

Implementation-agnostic framework for synchronous batch text-to-speech transcription with backend services such as AWS, Watson, etc.

This module itself does NOT include a full implementation or an integration with any transcription service. The intention instead is that you include a specific implementation in your project. For example, for AWS Transcribe, use (py-transcribe-aws)[https://github.com/ICTLearningSciences/py-transcribe-aws]

Python Installation

pip install py-transcribe

Usage

You first need to install some concrete implementation of py-transcribe. If you are using AWS, then you can install transcribe-aws like this:

pip install py-transcribe-aws

...once the implementation is installed, you can configure that one of two ways:

Setting the implementation module path

Set ENV var TRANSCRIBE_MODULE_PATH, e.g.

export TRANSCRIBE_MODULE_PATH=transcribe_aws

or pass the module path at service-creation time, e.g.

from transcribe import init_transcription_service


service = init_transcription_service(
    module_path="transcribe_aws"
)

Basic usage

Once you're set up, basic usage looks like this:

from transcribe import (
    init_transcription_service
    TranscribeJobRequest,
    TranscribeJobStatus
)


service = init_transcription_service()
result = service.transcribe([
    TranscribeJobRequest(
        sourceFile="/some/path/j1.wav"
    ),
    TranscribeJobRequest(
        sourceFile="/some/other/path/j2.wav"
    )
])
for j in result.jobs():
    if j.status == TranscribeJoStatus.SUCCEEDED:
        print(j.transcript)
    else:
        print(j.error)

Handling updates on large/long-running batch jobs

The main transcribe method is synchronous to hide the async/polling-based complexity of most transcribe services. But for any non-trivial batch of transcriptions, you probably do want to receive periodic updates, for example to save any completed transcriptions. You can do that by passing an on_update callback as follows:

from transcribe import (
    init_transcription_service
    TranscribeJobRequest,
    TranscribeJobStatus,
    TranscribeJobsUpdate
)


service = init_transcription_service()


def _on_update(u: TranscribeJobsUpdate) -> None:
    for j in u.jobs():
        if j.status == TranscribeJoStatus.SUCCEEDED:
            print(f"save result: {j.transcript}")
        else:
            print(j.error)

result = service.transcribe(
    [
        TranscribeJobRequest(
            sourceFile="/some/path/j1.wav"
        ),
        TranscribeJobRequest(
            sourceFile="/some/other/path/j2.wav"
        )
    ],
    on_update=_on_update
)

Configuring the environment for your implementation

Most implementations will also require other configuration, which you can either set in your environment or pass to init_transcription_service as config={}. See your implementation docs for details.

Development

Run tests during development with

make test-all

Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)

FAQs

What is py-transcribe?

Is py-transcribe well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

py-transcribe

py-transcribe

Python Installation

Usage

Setting the implementation module path

Basic usage

Handling updates on large/long-running batch jobs

Configuring the environment for your implementation

Development

Related posts

North Korean APT Lazarus Targets Developers with Malicious npm Package

CISA Brings KEV Data to GitHub