Test, evaluate & monitor your AI application
🐦 Twitter/X
•
📢 Discord
•
Parea AI
•
📙 Documentation
Parea AI provides a SDK to evaluate & monitor your AI applications. Below you can see quickstarts to:
Our full docs are here.
Installation
pip install -U parea-ai
or install with Poetry
poetry add parea-ai
Evaluating Your LLM App
Testing your AI app means to execute it over a dataset and score it with an evaluation function.
This is done in Parea by defining & running experiments.
Below you can see can example of how to test a greeting bot with the Levenshtein distance metric.
from parea import Parea, trace
from parea.evals.general import levenshtein
p = Parea(api_key="<<PAREA_API_KEY>>")
@trace(eval_funcs=[levenshtein])
def greeting(name: str) -> str:
return f"Hello {name}"
data = [
{"name": "Foo", "target": "Hi Foo"},
{"name": "Bar", "target": "Hello Bar"},
]
p.experiment(
name="Greeting",
data=data,
func=greeting,
).run()
In the snippet above, we used the trace
decorator to capture any inputs & outputs of the function.
This decorator also enables to score the output by executing the levenshtein
eval in the background.
Then, we defined an experiment via p.experiment
to evaluate our function (greeting
) over a dataset (here a list of dictionaries).
Finally, calling run
will execute the experiment, and create a report of outputs, scores & traces for any sample of the dataset.
You can find a link to the executed experiment here. (todo: fill-in experiment)
More Resources
Read more about how to write, run & analyze experiments in our docs.
Logging & Observability
By wrapping the respective clients, you can automatically log all your LLM calls to OpenAI & Anthropic.
Additionally, using the trace
decorator you can create hierarchical traces of your LLM application to e.g. associate LLM calls with the retrieval step of a RAG pipeline.
You can see the full observability documentation here and our integrations into LangChain, Instructor, DSPy, LiteLLM & more here.
Automatically log all your OpenAI calls
To automatically log any OpenAI call, you can wrap the OpenAI client with the Parea client using the wrap_openai_client
method.
from openai import OpenAI
from parea import Parea
client = OpenAI(api_key="OPENAI_API_KEY")
p = Parea(api_key="PAREA_API_KEY")
p.wrap_openai_client(client)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": "Write a Hello World program in Python using FastAPI.",
}
],
)
print(response.choices[0].message.content)
Automatically log all your Anthropic calls
To automatically log any Anthropic call, you can wrap the Anthropic client with the Parea client using the wrap_anthropic_client
method.
import anthropic
from parea import Parea
p = Parea(api_key="PAREA_API_KEY")
client = anthropic.Anthropic()
p.wrap_anthropic_client(client)
message = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Write a Hello World program in Python using FastAPI.",
}
],
)
print(message.content[0].text)
Nested traces
By using the trace
decorator, you can create hierarchical traces of your LLM application.
from openai import OpenAI
from parea import Parea, trace
client = OpenAI(api_key="OPENAI_API_KEY")
p = Parea(api_key="PAREA_API_KEY")
p.wrap_openai_client(client)
def llm(messages: list[dict[str, str]]) -> str:
response = client.chat.completions.create(model="gpt-4o", messages=messages)
return response.choices[0].message.content
@trace
def hello_world(lang: str, framework: str):
return llm([{"role": "user", "content": f"Write a Hello World program in {lang} using {framework}."}])
@trace
def critique_code(code: str):
return llm([{"role": "user", "content": f"How can we improve this code: \n {code}"}])
@trace(metadata={"purpose": "example"}, end_user_identifier="John Doe")
def chain(lang: str, framework: str) -> str:
return critique_code(hello_world(lang, framework))
print(chain("Python", "FastAPI"))
Deploying Prompts
Deployed prompts enable collaboration with non-engineers such as product managers & subject-matter experts.
Users can iterate, refine & test prompts on Parea's playground.
After tinkering, you can deploy that prompt which means that it is exposed via an API endpoint to integrate it into your application.
Checkout our full docs here.
from parea import Parea
from parea.schemas.models import Completion, UseDeployedPrompt, CompletionResponse, UseDeployedPromptResponse
p = Parea(api_key="<PAREA_API_KEY>")
deployment_id = '<DEPLOYMENT_ID>'
inputs = {"x": "Golang", "y": "Fiber"}
test_completion = Completion(
**{
"deployment_id": deployment_id,
"llm_inputs": inputs,
"metadata": {"purpose": "testing"}
}
)
test_get_prompt = UseDeployedPrompt(deployment_id=deployment_id, llm_inputs=inputs)
def main():
completion_response: CompletionResponse = p.completion(data=test_completion)
print(completion_response)
deployed_prompt: UseDeployedPromptResponse = p.get_prompt(data=test_get_prompt)
print("\n\n")
print(deployed_prompt)
🛡 License
This project is licensed under the terms of the Apache Software License 2.0
license.
See LICENSE for more details.
📃 Citation
@misc{parea-sdk,
author = {joel-parea-ai,joschkabraun},
title = {Parea python sdk},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/parea-ai/parea-sdk}}
}