Replicate Python client
This is a Python client for Replicate. It lets you run models from your Python code or Jupyter notebook, and do various other things on Replicate.
Breaking Changes in 1.0.0
The 1.0.0 release contains breaking changes:
- The
replicate.run()
method now returns FileOutput
s instead of URL strings by default for models that output files. FileOutput
implements an iterable interface similar to httpx.Response
, making it easier to work with files efficiently.
To revert to the previous behavior, you can opt out of FileOutput
by passing use_file_output=False
to replicate.run()
:
output = replicate.run("acmecorp/acme-model", use_file_output=False)
In most cases, updating existing applications to call output.url
should resolve any issues. But we recommend using the FileOutput
objects directly as we have further improvements planned to this API and this approach is guaranteed to give the fastest results.
[!TIP]
👋 Check out an interactive version of this tutorial on Google Colab.
Requirements
Install
pip install replicate
Authenticate
Before running any Python scripts that use the API, you need to set your Replicate API token in your environment.
Grab your token from replicate.com/account and set it as an environment variable:
export REPLICATE_API_TOKEN=<your token>
We recommend not adding the token directly to your source code, because you don't want to put your credentials in source control. If anyone used your API key, their usage would be charged to your account.
Run a model
Create a new Python file and add the following code, replacing the model identifier and input with your own:
>>> import replicate
>>> outputs = replicate.run(
"black-forest-labs/flux-schnell",
input={"prompt": "astronaut riding a rocket like a horse"}
)
[<replicate.helpers.FileOutput object at 0x107179b50>]
>>> for index, output in enumerate(outputs):
with open(f"output_{index}.webp", "wb") as file:
file.write(output.read())
replicate.run
raises ModelError
if the prediction fails.
You can access the exception's prediction
property
to get more information about the failure.
import replicate
from replicate.exceptions import ModelError
try:
output = replicate.run("stability-ai/stable-diffusion-3", { "prompt": "An astronaut riding a rainbow unicorn" })
except ModelError as e
if "(some known issue)" in e.prediction.logs:
pass
print("Failed prediction: " + e.prediction.id)
[!NOTE]
By default the Replicate client will hold the connection open for up to 60 seconds while waiting
for the prediction to complete. This is designed to optimize getting the model output back to the
client as quickly as possible.
The timeout can be configured by passing wait=x
to replicate.run()
where x
is a timeout
in seconds between 1 and 60. To disable the sync mode you can pass wait=False
.
AsyncIO support
You can also use the Replicate client asynchronously by prepending async_
to the method name.
Here's an example of how to run several predictions concurrently and wait for them all to complete:
import asyncio
import replicate
model_version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"
prompts = [
f"A chariot pulled by a team of {count} rainbow unicorns"
for count in ["two", "four", "six", "eight"]
]
async with asyncio.TaskGroup() as tg:
tasks = [
tg.create_task(replicate.async_run(model_version, input={"prompt": prompt}))
for prompt in prompts
]
results = await asyncio.gather(*tasks)
print(results)
To run a model that takes a file input you can pass either
a URL to a publicly accessible file on the Internet
or a handle to a file on your local device.
>>> output = replicate.run(
"andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",
input={ "image": open("path/to/mystery.jpg") }
)
"an astronaut riding a horse"
Run a model and stream its output
Replicate’s API supports server-sent event streams (SSEs) for language models.
Use the stream
method to consume tokens as they're produced by the model.
import replicate
for event in replicate.stream(
"meta/meta-llama-3-70b-instruct",
input={
"prompt": "Please write a haiku about llamas.",
},
):
print(str(event), end="")
[!TIP]
Some models, like meta/meta-llama-3-70b-instruct,
don't require a version string.
You can always refer to the API documentation on the model page for specifics.
You can also stream the output of a prediction you create.
This is helpful when you want the ID of the prediction separate from its output.
prediction = replicate.predictions.create(
model="meta/meta-llama-3-70b-instruct"
input={"prompt": "Please write a haiku about llamas."},
stream=True,
)
for event in prediction.stream():
print(str(event), end="")
For more information, see
"Streaming output" in Replicate's docs.
Run a model in the background
You can start a model and run it in the background using async mode:
>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
version=version,
input={"prompt":"Watercolor painting of an underwater submarine"})
>>> prediction
Prediction(...)
>>> prediction.status
'starting'
>>> dict(prediction)
{"id": "...", "status": "starting", ...}
>>> prediction.reload()
>>> prediction.status
'processing'
>>> print(prediction.logs)
iteration: 0, render:loss: -0.6171875
iteration: 10, render:loss: -0.92236328125
iteration: 20, render:loss: -1.197265625
iteration: 30, render:loss: -1.3994140625
>>> prediction.wait()
>>> prediction.status
'succeeded'
>>> prediction.output
<replicate.helpers.FileOutput object at 0x107179b50>
>>> with open("output.png", "wb") as file:
file.write(prediction.output.read())
Run a model in the background and get a webhook
You can run a model and get a webhook when it completes, instead of waiting for it to finish:
model = replicate.models.get("ai-forever/kandinsky-2.2")
version = model.versions.get("ea1addaab376f4dc227f5368bbd8eff901820fd1cc14ed8cad63b29249e9d463")
prediction = replicate.predictions.create(
version=version,
input={"prompt":"Watercolor painting of an underwater submarine"},
webhook="https://example.com/your-webhook",
webhook_events_filter=["completed"]
)
For details on receiving webhooks, see replicate.com/docs/webhooks.
Compose models into a pipeline
You can run a model and feed the output into another model:
laionide = replicate.models.get("afiaka87/laionide-v4").versions.get("b21cbe271e65c1718f2999b038c18b45e21e4fba961181fbfae9342fc53b9e05")
swinir = replicate.models.get("jingyunliang/swinir").versions.get("660d922d33153019e8c263a3bba265de882e7f4f70396546b6c9c8f9d47a021a")
image = laionide.predict(prompt="avocado armchair")
upscaled_image = swinir.predict(image=image)
Get output from a running model
Run a model and get its output while it's running:
iterator = replicate.run(
"pixray/text2image:5c347a4bfa1d4523a58ae614c2194e15f2ae682b57e3797a5bb468920aa70ebf",
input={"prompts": "san francisco sunset"}
)
for index, image in enumerate(iterator):
with open(f"file_{index}.png", "wb") as file:
file.write(image.read())
Cancel a prediction
You can cancel a running prediction:
>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
version=version,
input={"prompt":"Watercolor painting of an underwater submarine"}
)
>>> prediction.status
'starting'
>>> prediction.cancel()
>>> prediction.reload()
>>> prediction.status
'canceled'
List predictions
You can list all the predictions you've run:
replicate.predictions.list()
Lists of predictions are paginated. You can get the next page of predictions by passing the next
property as an argument to the list
method:
page1 = replicate.predictions.list()
if page1.next:
page2 = replicate.predictions.list(page1.next)
Load output files
Output files are returned as FileOutput
objects:
import replicate
from PIL import Image
output = replicate.run(
"stability-ai/stable-diffusion:27b93a2413e7f36cd83da926f3656280b2931564ff050bf9575f1fdf9bcd7478",
input={"prompt": "wavy colorful abstract patterns, oceans"}
)
with open("my_output.png", "wb") as file:
file.write(output[0].read())
background = Image.open(output[0])
FileOutput
Is a file-like object returned from the replicate.run()
method that makes it easier to work with models that output files. It implements Iterator
and AsyncIterator
for reading the file data in chunks as well as read()
and aread()
to read the entire file into memory.
[!NOTE]
It is worth noting that at this time read()
and aread()
do not currently accept a size
argument to read up to size
bytes.
Lastly, the URL of the underlying data source is available on the url
attribute though we recommend you use the object as an iterator or use its read()
or aread()
methods, as the url
property may not always return HTTP URLs in future.
print(output.url)
To consume the file directly:
with open('output.bin', 'wb') as file:
file.write(output.read())
Or for very large files they can be streamed:
with open(file_path, 'wb') as file:
for chunk in output:
file.write(chunk)
Each of these methods has an equivalent asyncio
API.
async with aiofiles.open(filename, 'w') as file:
await file.write(await output.aread())
async with aiofiles.open(filename, 'w') as file:
await for chunk in output:
await file.write(chunk)
For streaming responses from common frameworks, all support taking Iterator
types:
Django
@condition(etag_func=None)
def stream_response(request):
output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
return HttpResponse(output, content_type='image/webp')
FastAPI
@app.get("/")
async def main():
output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
return StreamingResponse(output)
Flask
@app.route('/stream')
def streamed_response():
output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
return app.response_class(stream_with_context(output))
You can opt out of FileOutput
by passing use_file_output=False
to the replicate.run()
method.
const replicate = replicate.run("acmecorp/acme-model", use_file_output=False);
List models
You can list the models you've created:
replicate.models.list()
Lists of models are paginated. You can get the next page of models by passing the next
property as an argument to the list
method, or you can use the paginate
method to fetch pages automatically.
models = []
for page in replicate.paginate(replicate.models.list):
models.extend(page.results)
if len(models) > 100:
break
page = replicate.models.list()
while page:
models.extend(page.results)
if len(models) > 100:
break
page = replicate.models.list(page.next) if page.next else None
You can also find collections of featured models on Replicate:
>>> collections = [collection for page in replicate.paginate(replicate.collections.list) for collection in page]
>>> collections[0].slug
"vision-models"
>>> collections[0].description
"Multimodal large language models with vision capabilities like object detection and optical character recognition (OCR)"
>>> replicate.collections.get("text-to-image").models
[<Model: stability-ai/sdxl>, ...]
Create a model
You can create a model for a user or organization
with a given name, visibility, and hardware SKU:
import replicate
model = replicate.models.create(
owner="your-username",
name="my-model",
visibility="public",
hardware="gpu-a40-large"
)
Here's how to list of all the available hardware for running models on Replicate:
>>> [hw.sku for hw in replicate.hardware.list()]
['cpu', 'gpu-t4', 'gpu-a40-small', 'gpu-a40-large']
Fine-tune a model
Use the training API to fine-tune models to make them better at a particular task. To see what language models currently support fine-tuning, check out Replicate's collection of trainable language models.
If you're looking to fine-tune image models, check out Replicate's guide to fine-tuning image models.
Here's how to fine-tune a model on Replicate:
training = replicate.trainings.create(
model="stability-ai/sdxl",
version="39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
input={
"input_images": "https://my-domain/training-images.zip",
"token_string": "TOK",
"caption_prefix": "a photo of TOK",
"max_train_steps": 1000,
"use_face_detection_instead": False
},
destination="your-username/model-name"
)
Customize client behavior
The replicate
package exports a default shared client. This client is initialized with an API token set by the REPLICATE_API_TOKEN
environment variable.
You can create your own client instance to pass a different API token value, add custom headers to requests, or control the behavior of the underlying HTTPX client:
import os
from replicate.client import Client
replicate = Client(
api_token=os.environ["SOME_OTHER_REPLICATE_API_TOKEN"]
headers={
"User-Agent": "my-app/1.0"
}
)
[!WARNING]
Never hardcode authentication credentials like API tokens into your code.
Instead, pass them as environment variables when running your program.
Development
See CONTRIBUTING.md