abottle
trition/tensorrt/onnxruntim/pytorch python server wrapper
put your model into a bottle then you get a working server and more.
Demo
import numpy as np
from transformers import AutoTokenizer
class MiniLM:
def __init__(self):
self.tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
def predict(self, X):
encode_dict = self.tokenizer(
X, padding="max_length", max_length=128, truncation=True
)
input_ids = np.array(encode_dict["input_ids"], dtype=np.int32)
attention_mask = np.array(encode_dict["attention_mask"], dtype=np.int32)
outputs = self.model.infer(
{"input_ids": input_ids, "attention_mask": attention_mask}, ["y"]
)
return outputs['y']
class Config:
class TritonModel:
name = "minilm"
version = "2"
you can write a class like this, and then starts with abottle
abottle main.MiniLM
with default, abottle will run as server, and server at 0.0.0.0:8081
curl localhost:8081/predict
abottle will inject an attribute named model
into your class, and you don't need to care what that model runtime is.
it can be Pytorch with CuDNN8 or an optimized TensorRT plan, it depends on the config you give
self.model.infer({"input1": input1_tensor, "input2": input2_tensor}, ['output_1'])
config with shell
abottle main.MiniLM --config """TritonModel:
triton_url: localhost
name: minilm
version: 2
"""
config with file
abottle main.MiniLM --config <config yaml file path>
import numpy as np
import pandas as pd
from transformers import AutoTokenizer
from typing import List
class MiniLM:
def __init__(self):
self.tokenizer = AutoTokenizer.from_pretrained(
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)
def cosine(self, a: List[List[float]], b: List[List[float]]) -> float:
a, b = np.array(a), np.array(b)
sqrt_sqare_A = np.tile(
np.sqrt(np.sum(np.square(a), axis=1)).reshape((a.shape[0], 1)),
(1, a.shape[0]),
)
sqrt_sqare_B = np.tile(
np.sqrt(np.sum(np.square(b.T), axis=0)).reshape((1, b.shape[0])),
(b.shape[0], 1),
)
score_matrix = np.divide(np.dot(a, b.T), sqrt_sqare_A * sqrt_sqare_B)
return score_matrix
def predict(self, X: List[str]) -> List[List[float]]:
encode_dict = self.tokenizer(
X, padding="max_length", max_length=128, truncation=True
)
input_ids = np.array(encode_dict["input_ids"], dtype=np.int32)
attention_mask = np.array(encode_dict["attention_mask"], dtype=np.int32)
outputs = self.model.infer(
{"input_ids": input_ids, "attention_mask": attention_mask}, ["y"]
)
return outputs["y"]
def evaluate(self, file_path: str, batch_size: int) -> float:
test_data = pd.read_csv(file_path, sep=", ", names=["query", "label"])
query, label = test_data["query"].tolist(), test_data["label"].tolist()
assert len(query) == len(label)
query_embedding, label_embedding = [], []
for i in range(0, len(query), batch_size):
query_embedding += self.predict(query[i : min(i + batch_size, len(query))])
label_embedding += self.predict(label[i : min(i + batch_size, len(label))])
assert len(query_embedding) == len(label_embedding)
score_matrix = self.cosine(query_embedding, label_embedding)
raw_result = np.argmax(score_matrix, axis=0) == np.array(
[i for i in range(score_matrix.shape[0])]
)
unique, counts = np.unique(a, return_counts=True)
top_1_accuracy = counts[unique.tolist().index(True)] / np.sum(counts)
return top_1_accuracy
def evaluate can be used as a tester like below
abottle main.MiniLM --as tester file_path='test.csv', batch_size=100
the arguments you defined in the evaluate
function can be set in CLI args with format xxx=xxx
you can use different wrapper for your model, including:
- abottle.ONNXModel
- abottle.TensorRTModel
- abottle.TritonModel
- abottle.PytorchModel
if you want to add more wrappers you can just implement abottle.BaseModel
abottle main.MiniLM --as server --wrapper abottle.TritonModel
Configs
abottle.ONNXModel
ONNXModel:
ort_file: 'the ort file path'
abottle.TensorRTModel
TensorRTModel:
trt_file: 'TensorRT plan file path'
abottle.TritonModel
TritonModel:
name: "your model's name on triton server"
version: "your model's version on triton server"
triton_url: "triton server's host without schema, it means http://xxx is invalid"
abottle.PytorchModel(not fully implemented)
PytorchModel:
model: 'pytroch importable name'
Motivation
as a DL model creator, you don't need to focus on how to serve or test the performance of a model on a target platform or how to optimize your model and don't lose accuracy, just find a bottle and put your logic code into it, the DL engineer people can do those things for you, all you need to do is export your model to a onnx file, and write logic code like above examples.
Feature
we will build this bottle as strong as possible, make this bottle become a standardization interface of the MLOps cycles, you can see more and more scenarios like optimization, graph fusing, performance test, deployment, data gathering, etc using this bottle.