Curator Evals
This is a package for evaluating coding curators.
Installation
pip install curator-evals
Usage
from curator_evals.code_curator_eval.evaluation_classification_code_curator import evaluate_model
evaluate_model(config_path="configs/code_curator_eval.yaml")
Configuration
The configuration is done through a YAML file.
Example config:
model:
name: "accounts/fireworks/models/llama-v3p3-70b-instruct"
api_key: ${FIREWORKS_API_KEY}
base_url: "https://api.fireworks.ai/inference/v1"
max_tokens: 2048
temperature: 0
dataset:
name: "collinear-ai/coding_curators_evaluation_dataset"
splits: ["humanevalpack", "mbpp"]
input_column: "instruction"
output_column: "solution"
label_column: "label"
total_rows: 20
output:
file: "../data/classification_humanevalpack_eval"
If no dataset is provided, the evaluation will be done on the default dataset, which is
currently set to both splits of the collinear-ai/coding_curators_evaluation_dataset
.
You can import the function and pass in the config path to evaluate the model.
from curator_evals.code_curator_eval.evaluation_classification_code_curator import evaluate_model
evaluate_model(config_path="/home/tsach/curator-evals/configs/code_curator_eval.yaml")