llm
Package to connect and trace LLM calls.
Usage
import { LLM } from "@empiricalrun/llm";
const llm = new LLM({
provider: "openai",
defaultModel: "gpt-4o",
});
const llmResponse = await llm.createChatCompletion({ ... });
Vision utilities
This package also contains utilities for vision, e.g. extract text (OCR).
import { extractText } from "@empiricalrun/llm/vision";
const data = await driver.saveScreenshot('dummy.png');
const instruction = "Extract number of ATOM tokens from the image. Return only the number.";
const text = await extractText(data.toString('base64'), instruction);
Get bounding boxes
import { getBoundingBox } from "@empiricalrun/llm/vision";
const data = await driver.saveScreenshot('dummy.png');
const instruction = "This screenshot shows a screen to send crypto tokens. What is the bounding box for the dropdown to select the token?";
const bbox = await getBoundingBox(data.toString('base64'), instruction);
const centerToTap = bbox.center;
Bounding box can require some prompt iterations, and you can do that with a debug
flag. This flag
copies the output of the operation to your clipboard (macOS only.)
Paste this output in the address bar of your browser to visualize the output.
const bbox = await getBoundingBox(data.toString('base64'), instruction, { debug: true });
Example script for prompt iterations. The package should be installed and available to this script.
import fs from "fs";
import { getBoundingBox } from "@empiricalrun/llm/vision";
async function main() {
const prompt = "What is the bounding box for the first dropdown menu?";
const imagePath = "/path/to/the/image.png";
const imageData = fs.readFileSync(imagePath).toString("base64");
await getBoundingBox(imageData, prompt, { debug: true });
}
main();