WebArena benchmark for BrowserGym
This package provides browsergym.webarena
, which is an unofficial port of the WebArena benchmark for BrowserGym.
Note: the original WebArena codebase has been slightly adapted to ensure compatibility.
Server installation
You have two options to setup your webarena instance:
We recommend option 2 as it allows you to easily customize the ports of each webarena domain, and offers a reset functionality that allwos browsergym to trigger a full instance reset remotely.
Setup
- Install the package
pip install browsergym-webarena
- Download tokenizer resources
python -c "import nltk; nltk.download('punkt_tab')"
- Setup the URLs as environment variables. The ports for each domain here should correspond to those you used when setting up your webarena instance. Note also the
WA_
prefix which is specific to browsergym.
BASE_URL=<YOUR_SERVER_URL_HERE>
export WA_SHOPPING="$BASE_URL:8082/"
export WA_SHOPPING_ADMIN="$BASE_URL:8083/admin"
export WA_REDDIT="$BASE_URL:8080"
export WA_GITLAB="$BASE_URL:9001"
export WA_WIKIPEDIA="$BASE_URL:8081/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export WA_MAP="$BASE_URL:443"
export WA_HOMEPAGE="$BASE_URL:80"
export WA_FULL_RESET="$BASE_URL:7565"
export WA_FULL_RESET=""
- Setup an OpenAI API key
export OPENAI_API_KEY=...
NOTE: be mindful of costs, as WebArena will call GPT4 for certain evaluations (llm_fuzzy_match).