Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
EasyEval is a fully open-source evaluation wrapper that aims to streamline the integration, customization, and expansion of robust evaluation engines like lm-eval-harness and bigcode-eval-harness into existing production-grade or research pipelines effortlessly. It supports over 200 existing datasets and can be easily adapted for custom ones, making it a versatile solution for enhancing evaluation processes.
Evaluation has been open-problem for LLMs. When evaluating LLMs into production, we need to rely on different evaluation techniques. However the problem that we lot of times face is to integrate good evaluation engines into different existing production LLM pipelines.
So what are the solutions:
Now there are some handful of open-soure libraries that does evaluation on large scale evaluation datasets. Some of the examples are:
Other than that we have tons and tons of evaluation libraries where a huge percentage is an extension of the above engines. The way this engine works they define some taxonomy of how they evaluate.
For example: LM Evaluation Harness by Eleuther AI defines different tasks and under each task we have different datasets. We use the "test/evaluation" split of the datasets to evaluate the LLM of choice.
The problem with these evaluators is, most of them are CLI first. They expose very little documentation on their actual API interfaces. These libraries becomes super useful if they can be easily integrated or extended or customized with newer tasks in existing production pipelines. Production pipelines like:
And like this many more.
This library acts as a wrapper to combine both the engines lm-eval-harness (mostly consist of evaluation dataset across different general tasks) and bigcode-eval-harness (evaluation dataset exclusivelty for code-generation tasks) with common interfaces. The features of the library include:
Let's get started to install the library first. To do that open the terminal and make new virtual environment, and intall easyeval.
pip install easy_evaluator
🚧 Usage documentation is still in progress 🚧
The very first version include a simple interface to interact with lm-eval-harness engine. Here is how you can do that.
from easy_eval import HarnessEvaluator
from easy_eval.config import EvaluatorConfig
Evaluation Config is where you provide your model's generation configuration. You can checkout all the configs here. After this, we instantiate our evaluator.
harness = HarnessEvaluator(model_name_or_path="gpt2", model_backend="huggingface", device="cpu")
# For device you can set cpu or cuda, the standard way of setting up devices.
HarnessEvaluator
expects you to provide the model_backend
. Here are some supported backends:
And also model_name_or_path
which is the name the model (if huggingface repo) or the model path of the corresponding model_backend
Once we instantiated our evaluator, we are going to define our config. Defining config is fully optional. If we not pass config, the default values in config will be choosen.
config = EvaluatorConfig(
limit=10 # the number of datapoints to take for evaluation
)
And now we get our evaluation result by passing the config and list of evaluation tasks, we want our model to evaluate on.
results = harness.evaluate(
tasks=["babi"],
config=config, show_results_terminal=True
)
print(results)
This will return a result
in a json format.
easyeval
is at super early stage right now. You can check out the roadmap to see what are the expected features to come in future.
This is a fully open-sourced project. So contributions are highly appreciated. Here is how you can contribute:
@misc{eval-harness,
author = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = 12,
year = 2023,
publisher = {Zenodo},
version = {v0.4.0},
doi = {10.5281/zenodo.10256836},
url = {https://zenodo.org/records/10256836}
}
@misc{bigcode-evaluation-harness,
author = {Ben Allal, Loubna and
Muennighoff, Niklas and
Kumar Umapathi, Logesh and
Lipkin, Ben and
von Werra, Leandro},
title = {A framework for the evaluation of code generation models},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/bigcode-project/bigcode-evaluation-harness}},
year = 2022,
}
FAQs
A library for easy evaluation of language models
We found that easy-lm-eval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.