
Security News
MCP Community Begins Work on Official MCP Metaregistry
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Judgeval is an open-source framework for building evaluation pipelines for multi-step agent workflows, supporting both real-time and experimental evaluation setups. To learn more about Judgment or sign up for free, visit our website or check out our developer docs.
pip install judgeval
You can evaluate your workflow execution data to measure quality metrics such as hallucination.
Create a file named evaluate.py
with the following code:
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import FaithfulnessScorer
client = JudgmentClient()
example = Example(
input="What if these shoes don't fit?",
actual_output="We offer a 30-day full refund at no extra cost.",
retrieval_context=["All customers are eligible for a 30 day full refund at no extra cost."],
)
scorer = FaithfulnessScorer(threshold=0.5)
results = client.run_evaluation(
examples=[example],
scorers=[scorer],
model="gpt-4o",
)
print(results)
Click here for a more detailed explanation
Track your workflow execution for full observability with just a few lines of code.
Create a file named traces.py
with the following code:
from judgeval.common.tracer import Tracer, wrap
from openai import OpenAI
# Basic initialization
client = wrap(OpenAI())
judgment = Tracer(project_name="my_project")
# Or with S3 storage enabled
# NOTE: Make sure AWS creds correspond to an account with write access to the specified S3 bucket
judgment = Tracer(
project_name="my_project",
use_s3=True,
s3_bucket_name="my-traces-bucket", # Bucket created automatically if it doesn't exist
s3_aws_access_key_id="your-access-key", # Optional: defaults to AWS_ACCESS_KEY_ID env var
s3_aws_secret_access_key="your-secret-key", # Optional: defaults to AWS_SECRET_ACCESS_KEY env var
s3_region_name="us-west-1" # Optional: defaults to AWS_REGION env var or "us-west-1"
)
@judgment.observe(span_type="tool")
def my_tool():
return "Hello world!"
@judgment.observe(span_type="function")
def main():
task_input = my_tool()
res = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"{task_input}"}]
)
return res.choices[0].message.content
Click here for a more detailed explanation
Apply performance monitoring to measure the quality of your systems in production, not just on historical data.
Using the same traces.py file we created earlier:
from judgeval.common.tracer import Tracer, wrap
from judgeval.scorers import AnswerRelevancyScorer
from openai import OpenAI
client = wrap(OpenAI())
judgment = Tracer(project_name="my_project")
@judgment.observe(span_type="tool")
def my_tool():
return "Hello world!"
@judgment.observe(span_type="function")
def main():
task_input = my_tool()
res = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"{task_input}"}]
).choices[0].message.content
judgment.get_current_trace().async_evaluate(
scorers=[AnswerRelevancyScorer(threshold=0.5)],
input=task_input,
actual_output=res,
model="gpt-4o"
)
return res
Click here for a more detailed explanation
For more detailed documentation, please check out our docs and some of our demo videos for reference!
FAQs
Judgeval Package
We found that judgeval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.
Research
Security News
Malicious npm packages posing as developer tools target macOS Cursor IDE users, stealing credentials and modifying files to gain persistent backdoor access.