
Research
Malicious npm Packages Impersonate Flashbots SDKs, Targeting Ethereum Wallet Credentials
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
pip install virtualhome_eval
To run virtualhome_eval
from virtualhome_eval.agent_eval import agent_evaluation
agent_evaluation(mode=[generate_prompts, evaluate_results], eval_type=[goal_interpretation, action_sequence, transition_modeling], llm_response_path=[YOUR LLM OUTPUT DIR])
virtualhome-eval --mode [generate_prompts, evaluate_results] --eval-type [goal_interpretation, action_sequence] --llm-response-path [YOUR LLM OUTPUT DIR] --output-dir [YOUR EVAL OUTPUT DIR]
mode
: Specifies either generate prompts or evaluate results. Options are:
generate_prompts
evaluate_results
eval_type
: Specifies the evaluation task type. Options are:
goal_interpretation
action_sequence
subgoal_decomposition
transition_model
llm_response_path
: The path of LLM output directory to be evaluated. It is ""
by default, using the existing outputs at directory virtualhome_eval/llm_response/
. The function will evaluate all LLM outputs under the directory.dataset
: The dataset type. Options:
virtualhome
behavior
output_dir
: The directory to store the output results. By default, it is at output/
of current path.goal_interpretation
:agent_evaluation(mode='generate_prompts', eval_type='goal_interpretation')
goal_interpretation
:results = agent_evaluation(mode='evaluate_results', eval_type='goal_interpretation')
action_sequence
:agent_evaluation(mode='generate_prompts', eval_type='action_sequence')
action_sequence
:results = agent_evaluation(mode='evaluate_results', eval_type='action_sequence')
transition_model
:agent_evaluation(mode='generate_prompts', eval_type='transition_model')
transition_model
:results = agent_evaluation(mode='evaluate_results', eval_type='transition_model')
subgoal_decomposition
:agent_evaluation(mode='generate_prompts', eval_type='subgoal_decomposition')
subgoal_decomposition
:results = agent_evaluation(mode='evaluate_results', eval_type='subgoal_decomposition')
FAQs
Embodied agent interface evaluation for VirtualHome
We found that virtualhome-eval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.