Introduction
Generic batch processing framework for managing the orchestration, dispatch, fault tolerance, and monitoring of
arbitrary work items against many endpoints. Extensible via dependency injection. Worker endpoints can be local,
remote, containers, cloud APIs, different processes, or even just different listener sockets in the same process.
Includes examples against Azure Cognitive Service containers for ML eval workloads.
Consuming
The framework can be built on via template method pattern and dependency injection. One simply needs to provide concrete implementation for the following types:
WorkItemRequest
: Encapsulates all the details needed by the WorkItemProcessor
to process a work item.
WorkItemResult
: Representation of the outcome of an attempt to process a WorkItemRequest
.
WorkItemProcessor
: Provides implementation on how to process a WorkItemRequest
against an endpoint.
BatchRequest
: Represents a batch of work items to do. Produces a collection of WorkItemRequest
s.
BatchConfig
: Details needed for a BatchRequest
to produce the collection of WorkItemRequest
s.
BatchRunSummarizer
: Implements a near-real-time status updater based on WorkItemResult
s as the batch progresses.
EndpointStatusChecker
: Specifies how to determine whether an endpoint is healthy and ready to take on work from a WorkItemProcessor
.
The Speech Batch Kit is currently our prime example for consuming the framework.
The batchkit
package is available as an ordinary pypi package. See versions here: https://pypi.org/project/batchkit
Dev Environment
This project is developed for and consumed in Linux environments. Consumers also use WSL2, and other POSIX platforms may be compatible but are untested. For development and deployment outside of a container, we recommend using a Python virtual environment to install the requirements.txt
. The Speech Batch Kit example builds a container.
Tests
This project uses both unit tests run-tests
and stress tests run-stress-tests
for functional verification.
Building
There are currently 3 artifacts:
-
The pypi library of the batchkit framework as a library.
-
The pypi library of the batchkit-examples-speechsdk.
-
Docker container image for speech-batch-kit.
Examples
Speech Batch Kit
The Speech Batch Kit (batchkit_examples/speech_sdk) uses the framework to produce a tool that can be used for
transcription of very large numbers of audio files against Azure Cognitive Service Speech containers or cloud endpoints.
For introduction, see the Azure Cognitive Services page.
For detailed information, see the Speech Batch Kit's README.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to,
and actually do, grant us the rights to use your contribution. For details, visit
https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need
to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ
or contact opencode@microsoft.com with any additional questions or comments.