Socket
Book a DemoInstallSign in
Socket

virtualhome-eval

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

virtualhome-eval

Embodied agent interface evaluation for VirtualHome

0.1.1
pipPyPI
Maintainers
1

Installation and Usage Guide for virtualhome-eval

Install dependencies

pip install virtualhome_eval

Usage

To run virtualhome_eval

  • Use in python
from virtualhome_eval.agent_eval import agent_evaluation
agent_evaluation(mode=[generate_prompts, evaluate_results], eval_type=[goal_interpretation, action_sequence, transition_modeling], llm_response_path=[YOUR LLM OUTPUT DIR])
  • Use directly in the command line
virtualhome-eval --mode [generate_prompts, evaluate_results] --eval-type [goal_interpretation, action_sequence] --llm-response-path [YOUR LLM OUTPUT DIR] --output-dir [YOUR EVAL OUTPUT DIR]

Parameters

  • mode: Specifies either generate prompts or evaluate results. Options are:
    • generate_prompts
    • evaluate_results
  • eval_type: Specifies the evaluation task type. Options are:
    • goal_interpretation
    • action_sequence
    • subgoal_decomposition
    • transition_model
  • llm_response_path: The path of LLM output directory to be evaluated. It is "" by default, using the existing outputs at directory virtualhome_eval/llm_response/. The function will evaluate all LLM outputs under the directory.
  • dataset: The dataset type. Options:
    • virtualhome
    • behavior
  • output_dir: The directory to store the output results. By default, it is at output/ of current path.

Example usage in python

  • To generate prompts for goal_interpretation:
agent_evaluation(mode='generate_prompts',  eval_type='goal_interpretation')
  • To evaluate LLM outputs for goal_interpretation:
results = agent_evaluation(mode='evaluate_results', eval_type='goal_interpretation')
  • To generate prompts for action_sequence:
agent_evaluation(mode='generate_prompts',  eval_type='action_sequence')
  • To evaluate LLM outputs for action_sequence:
results = agent_evaluation(mode='evaluate_results', eval_type='action_sequence')
  • To generate Virtualhome prompts for transition_model:
agent_evaluation(mode='generate_prompts',  eval_type='transition_model')
  • To evaluate LLM outputs on Virtualhome for transition_model:
results = agent_evaluation(mode='evaluate_results', eval_type='transition_model')
  • To generate prompts for subgoal_decomposition:
agent_evaluation(mode='generate_prompts',  eval_type='subgoal_decomposition')
  • To evaluate LLM outputs for subgoal_decomposition:
results = agent_evaluation(mode='evaluate_results', eval_type='subgoal_decomposition')

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.