🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our Slack community!
What is BentoML?
BentoML is a Python library for building online serving systems optimized for AI apps and model inference.
🍱 Easily build APIs for Any AI/ML Model. Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.
🐳 Docker Containers made simple. No more dependency hell! Manage your environments, dependencies and model versions with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you deploy to different environments.
🧭 Maximize CPU/GPU utilization. Build high performance inference APIs leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.
👩💻 Fully customizable. Easily implement your own APIs or task queues, with custom business logic, model inference and multi-model composition. Supports any ML framework, modality, and inference runtime.
🚀 Ready for Production. Develop, run and debug locally. Seamlessly deploy to production with Docker containers or BentoCloud.
pip install torch transformers # additional dependencies for local run
bentoml serve service.py:Summarization
Now you can run inference from your browser at http://localhost:3000 or with a Python script:
import bentoml
with bentoml.SyncHTTPClient('http://localhost:3000') as client:
summarized_text: str = client.summarize([bentoml.__doc__])[0]
print(f"Result: {summarized_text}")
Deploying your first Bento
To deploy your BentoML Service code, first create a bentofile.yaml file to define its dependencies and environments. Find the full list of bentofile options here.
service:'service:Summarization'# Entry service import pathinclude:-'*.py'# Include all .py files in current directorypython:packages:# Python dependencies to include-torch-transformersdocker:python_version:"3.11"
Then, choose one of the following ways for deployment:
🐳 Docker Container
Run bentoml build to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:
bentoml build
Ensure Docker is running. Generate a Docker container image for deployment:
bentoml containerize summarization:latest
Run the generated image:
docker run --rm -p 3000:3000 summarization:latest
☁️ BentoCloud
BentoCloud provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production.
Get involved and join our Community Slack 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.
To report a bug or suggest a feature request, use
GitHub Issues.
Contributing
There are many ways to contribute to the project:
Report bugs and "Thumbs up" on issues that are relevant to you.
Share your feedback and discuss roadmap plans in the #bentoml-contributors channel here.
Thanks to all of our amazing contributors!
Usage tracking and feedback
The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:
BentoML: The easiest way to serve AI apps and models
We found that bentoml demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.