Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
.. image:: https://img.shields.io/pypi/v/ezsmdeploy.svg :target: https://pypi.python.org/pypi/ezsmdeploy :alt: Latest Version
.. image:: https://img.shields.io/badge/code_style-black-000000.svg :target: https://github.com/python/black :alt: Code style: black
.. image:: https://img.shields.io/badge/License-MIT-yellow.svg :target: https://opensource.org/licenses/MIT :alt: License: MIT
.. image:: https://img.shields.io/badge/Made%20With-Love-orange.svg :target: https://pypi.python.org/pypi/ezsmdeploy :alt: Made With Love
.. image:: https://img.shields.io/badge/Gen-AI-8A2BE2 :target: https://pypi.python.org/pypi/ezsmdeploy :alt: GenAI
Ezsmdeploy python SDK helps you easily deploy Machine learning models on SageMaker. It provides a rich set of features such as deploying models from hubs (like Huggingface or SageMaker Jumpstart), passing one or more model files (even with multi-model deployments), automatically choosing an instance based on model size or based on a budget, and load testing endpoints using an intuitive API. Ezsmdeploy uses the SageMaker Python SDK, which is an open source library for training and deploying machine learning models on Amazon SageMaker. This SDK however focuses on further simplifying deployment from existing models, and as such, this is for you if:
Note for some Sagemaker estimators, deployment from pretrained models is easy; consider the Tensorflow savedmodel format. You can very easily tar your save_model.pb and variables file and use the sagemaker.tensorflow.serving.Model to register and deploy your model. Nevertheless, if your TF model is saved as checkpoints, HDF5 file, or as Tflite file, or if you have deployments needs accross multiple types of serialized model files, this may help standardize your deployment pipeline and avoid the need for building new containers for each model.
Installing Ezsmdeploy Python SDK <#installing-the-ezsmdeploy-python-sdk>
__Key Features <#key-features>
__Other Features <#other-features>
__Model script requirements <#model-script-requirements>
__Sample notebooks <#sample-notebooks>
__Known gotchas <#known-gotchas>
__The Ezsmdeploy Python SDK is built to PyPI and has the following dependencies sagemaker>=1.55.3, cyaspin==0.16.0, shortuuid==1.0.1 and locustio==0.14.5. Ezsmdeploy can be installed with pip as follows:
::
pip install ezsmdeploy
Make sure you upgrade to the latest stable version of ezsmdeploy if you have been using this library in the past:
::
pip install -U ezsmdeploy
To install locustio for testing, do:
::
pip install ezsmdeploy[locust]
Cleanest way to install this package is within a virtualenv:
::
python -m venv env
source env/bin/activate
pip install ezsmdeploy[locust]
In some cases, installs fail due to an existing package installed called "greenlet". This is not a direct dependency of ezsmdeploy but interferes with the installation. To fix this, either install in a virtualenv as seen above, or do:
::
pip install ezsmdeploy[locust] --ignore-installed greenlet
If you have another way to test the endpoint, or want to manage locust on your own, just do:
::
pip install ezsmdeploy
Key Features
At minimum, **ezsmdeploy** requires you to provide:
1. one or more model files
2. a python script with two functions: i) *load_model(modelpath)* - loads a model from a modelpath and returns a model object and ii) *predict(model,input)* - performs inference based on a model object and input data
3. a list of requirements or a requirements.txt file
For example, you can do:
::
ezonsm = ezsmdeploy.Deploy(model = 'model.pth',
script = 'modelscript_pytorch.py',
requirements = ['numpy','torch','joblib'])
You can also load multiple models ...
::
ezonsm = ezsmdeploy.Deploy(model = ['model1.pth','model2.pth'],
script = 'modelscript_pytorch.py',
requirements = ['numpy','torch','joblib'])
... or download tar.gz models from S3
::
ezonsm = ezsmdeploy.Deploy(model = ['s3://ezsmdeploy/pytorchmnist/model.tar.gz'],
script = 'modelscript_pytorch.py',
requirements = 'path/to/requirements.txt')
Other Features
The Deploy class is initialized with these parameters:
::
class Deploy(object):
def __init__(
self,
model,
script=None,
framework=None,
requirements=None,
dependencies=None,
name=None,
autoscale=False,
autoscaletarget=1000,
serverless=False,
serverless_memory=4096,
serverless_concurrency=10,
wait=True,
wait_time=300,
bucket=None,
prefix="",
volume_size=None,
session=None,
image=None,
dockerfilepath=None,
dockerextras=[],
instance_type=None,
instance_count=1,
budget=100,
ei=None,
monitor=False,
asynchronous=False,
foundation_model=False,
foundation_model_version="*",
huggingface_model=False,
huggingface_model_task=None,
huggingface_model_quantize=None,
):
Let's take a look at each of these parameters and what they do:
::
ezonsm = ezsmdeploy.Deploy(model = ... ,
script = ... ,
framework = 'sklearn')
::
ezonsm = ezsmdeploy.Deploy(model = ... ,
script = ... ,
framework = 'sklearn',
name = 'randomname')
|
|
|
|
|
|
|
|
|
|
You can now deploy state-of-the-art models like GPT-3, Falcon, and Bloom directly from Hugging Face or Jumpstart to SageMaker, without having to build custom containers or write complex deployment code. For example, to deploy the 40B parameter Falcon instruct model from Hugging Face, here is the code:
::
ez_falcon = Deploy(model="tiiuae/falcon-40b-instruct",
foundation_model=True,
huggingface_model=True)
|
You can combine multiple flags, for example, to deploy a Huggingface FM on a serverless instance easily by just enabling the serverless flag:
::
ez_tinybert = ezsmdeploy.Deploy(model = "Intel/dynamic_tinybert",
huggingface_model=True,
huggingface_model_task='question-answering',
serverless=True,
serverless_memory=6144
)
payload = {"inputs": {
"question": "Who discovered silk?",
"context": "Legend has it that the process for making silk cloth was first invented by the wife of the Yellow Emperor, Leizu, around the year 2696 BC. The idea for silk first came to Leizu while she was having tea in the imperial gardens." + "The production of silk originates in China in the Neolithic (Yangshao culture, 4th millennium BCE). Silk remained confined to China until the Silk Road opened at some point during the later half of the first millennium BCE. "
}}
response = ez_tinybert.predictor.predict(payload)
::
0:00:00.143132 | compressed model(s) 0:00:00.403894 | uploaded model tarball(s) ; check returned modelpath 0:00:00.404948 | added requirements file 0:00:00.406745 | added source file 0:00:00.408180 | added Dockerfile 0:00:00.409959 | added model_handler and docker utils 0:00:00.410072 | building docker container 0:01:59.298091 | built docker container 0:01:59.647986 | created model(s). Now deploying on ml.m5.xlarge 0:09:31.904897 | deployed model 0:09:31.905450 | estimated cost is $0.3 per hour 0:09:31.905805 | Done! ✔
::
ezonsm.test(input_data, target_model='model1.tar.gz')
or
::
ezonsm.test(input_data, target_model='model1.tar.gz',usercount=20,hatchrate=10,timeoutsecs=10)
... to override default arguments. Read more about locust.io here https://docs.locust.io/en/stable/
Model Script requirements
Make sure your model script has a load_model() and predict() function. While you can still use sagemaker's serializers and deserializers, assume that you will get a payload in bytes, and that you have to return a prediction in bytes. What you do in between is up to you. For example, your model script may look like:
::
def load_model(modelpath):
clf = load(os.path.join(modelpath,'model.joblib'))
return clf
def predict(model, payload):
try:
# in remote / container based deployment, payload comes in as a stream of bytes
out = [str(model.predict(np.frombuffer(payload[0]['body']).reshape((1,64))))]
except Exception as e:
out = [type(payload),str(e)] #useful for debugging!
return out
Note that when using the Multi model mode, the payload comes in as a dictionary and the raw bytes sent in can be accessed using payload[0]['body']; In flask based deployments, you can just use payload as it is (comes in as bytes)
Large Language models
~~~~~~~~~~~~~~~~~~~~~
EzSMDeploy supports deploying foundation models through Jumpstart as well as huggingface. Genreral guidance:
1. Jumpstart models - `foundation_model=True`
2. Large huggingface models - `foundation_model=True, huggingface_model=True`
3. Small huggingface models - `huggingface_model=True`
4. Tiny models - `serverless=True`
To deploy models using Jumpstart:
::
ezonsm = ezsmdeploy.Deploy(model = "huggingface-text2text-flan-ul2-bf16",
foundation_model=True)
Note that with Jumpstart models, we can automatically retrieve default/suggested instances from SageMaker
To deploy a huggingface LLM model (this uses the huggingface llm container):
::
ezonsm = ezsmdeploy.Deploy(model = "tiiuae/falcon-40b-instruct",
foundation_model=True,
huggingface_model=True,
huggingface_model_task='text-generation',
instance_type="ml.g4dn.12xlarge"
)
(See release notes for models we have tested so far with instances that worked)
Note that at the time of writing this, officially supported model architectures for LLMs on Huggingface are currently:
- BLOOM / BLOOMZ
- MT0-XXL
- Galactica
- SantaCoder
- GPT-Neox 20B (joi, pythia, lotus, rosey, chip, RedPajama, open assistant)
- FLAN-T5-XXL (T5-11B)
- Llama (vicuna, alpaca, koala)
- Starcoder / SantaCoder
- Falcon 7B / Falcon 40B
Serverless inference
~~~~~~~~~~~~~~~~~~~~
Simply do `serverless=True`. Make sure you size your serverless endpoint correctly using `serverless_memory` and `serverless_concurrency`. You can combine other features as well, for example, to deploy a huggingface model on serverless use:
::
ezonsm = ezsmdeploy.Deploy(model = "distilbert-base-uncased-finetuned-sst-2-english",
huggingface_model=True,
huggingface_model_task='text-classification',
serverless=True
)
Supported Operating Systems
Ezsmdeploy SDK has been tested on Unix/Linux.
Supported Python Versions
Ezsmdeploy SDK has been tested on Python 3.6; should run in higher versions!
AWS Permissions
~~~~~~~~~~~~~~~
Ezsmdeploy uses the Sagemaker python SDK.
As a managed service, Amazon SageMaker performs operations on your behalf on the AWS hardware that is managed by Amazon SageMaker.
Amazon SageMaker can perform only operations that the user permits.
You can read more about which permissions are necessary in the `AWS Documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html>`__.
The SageMaker Python SDK should not require any additional permissions aside from what is required for using SageMaker.
However, if you are using an IAM role with a path in it, you should grant permission for ``iam:GetRole``.
Licensing
~~~~~~~~~
Ezsmdeploy is licensed under the MIT license and uses the SageMaker Python SDK. SageMaker Python SDK is licensed under the Apache 2.0 License. It is copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/
Sample Notebooks
~~~~~~~~~~~~~~~~~
https://github.com/aws-samples/easy-amazon-sagemaker-deployments/tree/master/notebooks
Known Gotchas
~~~~~~~~~~~~~~~~~~
* Ezsmdeploy uses the sagemaker python sdk under the hood, so any limitations / limits / restrictions are expected to be carried over
|
* Ezsmdeploy builds your docker container on the fly, and uses two types of base containers - a flask-nginx deployment stack or the Multi model server. Sending in a single model, or choosing to use a GPU instance will default to the flask-nginx stack. You can force the use of the MMS stack if you pass in a single model as a list, for example, ['model1.joblib']
|
* Ezsmdeploy uses a local 'src' folder as a staging folder which is reset at the beginning of every deploy. So consider using the package in separate project folders so there is no overlap/ overwriting of staging files.
|
* Ezsmdeploy uses Locust to do endpoint testing - any restrictions of the locustio package are also expected to be seen here.
|
* Ezsmdeploy has been tested from Sagemaker notebook instances (both GPU and non-GPU).
|
* The payload comes in as bytes; you can also use Sagemaker's serializer and deserializers to send in other formats of input data
|
* Not all feature combinations are tested; any contributions testing, for example, budget constraints are welcome!
|
* If you are doing local testing in a container, make sure you kill any running containers, since any invocations hit the same port. to do this, run:
::
docker container stop $(docker container ls -aq) >/dev/nul
* If your docker push fails, chances are that your disk is full. Try. clearing some docker images:
::
docker system prune -a
* If you encounter an "image does not exist" error, try running this script that exists after an unsuccessful run, but manually. For this, do:
::
./src/build-docker.sh
* Locust load testing on local endpoint has not been tested (and may not make much sense). Please use the .test() for remote deployment
|
* Use instance_type "local" if you would like to test locally (this lets you test using the MMS stack). If you intend to finally deploy your model to a GPU instance, use "local_gpu" - this launches the flask-nginx stack locally and the same stack when you deploy to a GPU.
|
* At the time of writing this guide, launching a multi-model server from sagemaker does not support GPUs (but the open source MMS repository has no such restrictions). Ezsmdeploy checks the number of models passed in, the instance type and other parameters to decide which stack to build for your endpoint.
CONTRIBUTING
------------
Please submit a pull request to the packages git repo
FAQs
SageMaker custom deployments made easy
We found that ezsmdeploy demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.