Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

github.com/reka-ai/reka-vibe-eval

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/reka-ai/reka-vibe-eval

  • v1.0.0
  • Source
  • Go
  • Socket score

Version published
Created
Source

Vibe-Eval

main

A benchmark for evaluating multimodal chat models, including especially challenging examples.

Example from the dataset

Dataset

The dataset including all images can be downloaded in the Releases page of this repo.

The dataset is stored as a JSONL file: data/vibe-eval.v1.jsonl. Each example has the following fields:

  • example_id: a unique ID for the example
  • category: the category that this example belongs to, either difficulty-normal or difficult-hard
  • prompt: the user prompt
  • reference: a golden reference answer for the prompt
  • media_filename: the name of the file in the dataset
  • media_url: a URL where the file is hosted publicly

Running the evaluation

To run the evaluation, use evaluate.py as follows:

python evaluate.py generations.jsonl -o out.jsonl

(you will have to install a couple of requirements, including the Reka API package with pip install -r requirements.txt)

The generations.jsonl is expected to contain model generations. It should be a JSONL file where each line is a JSON object with keys "generation" and "example_id" (matching the dataset).

This will output detailed results to out.jsonl and will also print a table of final results to stdout.

FAQs

Package last updated on 01 May 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc