github.com/reka-ai/reka-vibe-eval

v1.0.0
Source
Go

Version published: 7 months ago

Created: 7 months ago

Source

Vibe-Eval

A benchmark for evaluating multimodal chat models, including especially challenging examples.

Example from the dataset

Dataset

The dataset including all images can be downloaded in the Releases page of this repo.

The dataset is stored as a JSONL file: data/vibe-eval.v1.jsonl. Each example has the following fields:

example_id: a unique ID for the example
category: the category that this example belongs to, either difficulty-normal or difficult-hard
prompt: the user prompt
reference: a golden reference answer for the prompt
media_filename: the name of the file in the dataset
media_url: a URL where the file is hosted publicly

Running the evaluation

To run the evaluation, use evaluate.py as follows:

python evaluate.py generations.jsonl -o out.jsonl

(you will have to install a couple of requirements, including the Reka API package with pip install -r requirements.txt)

The generations.jsonl is expected to contain model generations. It should be a JSONL file where each line is a JSON object with keys "generation" and "example_id" (matching the dataset).

This will output detailed results to out.jsonl and will also print a table of final results to stdout.

FAQs

What is github.com/reka-ai/reka-vibe-eval?

Package last updated on 01 May 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/reka-ai/reka-vibe-eval

Vibe-Eval

Dataset

Running the evaluation

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates