
Evaluation tools for AI components, functions, workflows, and agents. Based on evalite with cloud storage integration.
Installation
npm install evals.do
yarn add evals.do
pnpm add evals.do
Usage
import { evals, EvalsClient } from 'evals.do'
const test = await evals.createTest({
name: 'My Test',
input: { prompt: 'Hello, world!' },
expected: { response: 'Hi there!' },
})
const customClient = new EvalsClient({
baseUrl: 'https://custom-evals.do',
apiKey: 'your-api-key',
storeLocally: true,
storeRemotely: true,
dbPath: './my-evals.db',
})
const tests = await Promise.all([
customClient.createTest({
name: 'Test 1',
input: { prompt: 'Tell me a joke' },
expected: { type: 'joke' },
}),
customClient.createTest({
name: 'Test 2',
input: { prompt: 'What is the capital of France?' },
expected: { answer: 'Paris' },
}),
])
const executor = {
execute: async (input: any) => {
return { response: `Processed: ${input.prompt}` }
},
}
const metrics = {
accuracy: {
calculate: (result: any, expected: any) => {
return result.response === expected.response ? 1 : 0
},
},
}
const results = await customClient.evaluate(executor, tests, {
metrics,
concurrency: 1,
timeout: 30000,
})
console.log(`Evaluation complete: ${results.id}`)
console.log(`Results: ${JSON.stringify(results.results, null, 2)}`)
API Reference
EvalsClient
The main client for creating and running evaluations.
Constructor
new EvalsClient(options?: EvalsOptions)
Options:
baseUrl
: The URL of the evals.do API (default: 'https://evals.do')
apiKey
: Your API key for authentication
storeLocally
: Whether to store data locally (default: true)
storeRemotely
: Whether to store data remotely (default: true)
dbPath
: Path to the local SQLite database (default: './node_modules/evalite/.evalite.db')
Methods
createTest(test: Partial<Test>): Promise<Test>
- Create a new test
getTest(id: string): Promise<Test | null>
- Get a test by ID
createResult(result: Partial<Result>): Promise<Result>
- Create a new result
getResult(id: string): Promise<Result | null>
- Get a result by ID
createRun(run: Partial<TestRun>): Promise<TestRun>
- Create a new test run
getRun(id: string): Promise<TestRun | null>
- Get a run by ID
evaluate<T, R>(executor: TaskExecutor<T, R>, tests: Test[], options?: EvaluationOptions): Promise<TestRun>
- Run an evaluation
Contributing
We welcome contributions! Please see our Contributing Guide for more details.
License
MIT
Dependencies
- apis.do - Unified API Gateway for all domains and services in the
.do
ecosystem