
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
STaRK is a large-scale Semi-structured Retrieval Benchmark on Textual and Relational Knowledge bases, covering applications in product search, academic paper search, and biomedicine inquiries.
Featuring diverse, natural-sounding, and practical queries that require context-specific reasoning, STaRK sets a new standard for assessing real-world retrieval systems driven by LLMs and presents significant challenges for future research.
🔥 Check out our website for more overview!
With python >=3.8 and <3.12
pip install stark-qa
Create a conda env with python >=3.8 and <3.12 and install required packages in requirements.txt
.
conda create -n stark python=3.11
conda activate stark
pip install -r requirements.txt
from stark_qa import load_qa, load_skb
dataset_name = 'amazon'
# Load the retrieval dataset
qa_dataset = load_qa(dataset_name)
idx_split = qa_dataset.get_idx_split()
# Load the semi-structured knowledge base
skb = load_skb(dataset_name, download_processed=True, root=None)
The root argument for load_skb specifies the location to store SKB data. With default value None
, the data will be stored in huggingface cache.
Question answer pairs for the retrieval task will be automatically downloaded in data/{dataset}/stark_qa
by default. We provided official split in data/{dataset}/split
.
There are two ways to load the knowledge base data:
download_processed=True
.download_processed=False
. In this case, STaRK-PrimeKG takes around 5 minutes to download and load the processed data. STaRK-Amazon and STaRK-MAG may takes around an hour to process from the raw data.If you are running eval, you may install the following packages:
pip install llm2vec gritlm bm25
Our evaluation requires embed the node documents into candidate_emb_dict.pt
, which is a dictionary node_id -> torch.Tensor
. Query embeddings will be automatically generated if not available. You can either run the following the python script to download query embeddings and document embeddings generated by text-embedding-ada-002
. (We provide them so you can run on our benchmark right away.)
python emb_download.py --dataset amazon --emb_dir emb/
Or you can run the following code to generate the query or document embeddings by yourself. E.g.,
python emb_generate.py --dataset amazon --mode query --emb_dir emb/ --emb_model text-embedding-ada-002
dataset
: one of amazon
, mag
or prime
.mode
: the content to embed, one of query
or doc
(node documents).emb_dir
: the directory to store embeddings.emb_model
: the LLM name to generate embeddings, such as text-embedding-ada-002
, text-embedding-3-large
, , voyage-large-2-instruct
, GritLM/GritLM-7B
, McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp
emb_generate.py
for other arguments.Run the python script for evaluation. E.g.,
python eval.py --dataset amazon --model VSS --emb_dir emb/ --output_dir output/ --emb_model text-embedding-ada-002 --split test --save_pred
python eval.py --dataset amazon --model VSS --emb_dir emb/ --output_dir output/ --emb_model GritLM/GritLM-7B --split test-0.1 --save_pred
python eval.py --dataset amazon --model LLMReranker --emb_dir emb/ --output_dir output/ --emb_model text-embedding-ada-002 --split human_generated_eval --llm_model gpt-4-1106-preview --save_pred
Key args:
dataset
: the dataset to evaluate on, one of amazon
, mag
or prime
.model
: the model to be evaluated, one of BM25
, Colbertv2
, VSS
, MultiVSS
, LLMReranker
.
--emb_model
.LLMReranker
, please specify the LLM name with argument --llm_model
.export ANTHROPIC_API_KEY=YOUR_API_KEY
or
export OPENAI_API_KEY=YOUR_API_KEY
export OPENAI_ORG=YOUR_ORGANIZATION
or
export VOYAGE_API_KEY=YOUR_API_KEY
emb_dir
: the directory to store embeddings.split
: the split to evaluate on, one of train
, val
, test
, test-0.1
(10% random sample), and human_generated_eval
(to be evaluated on the human generated query dataset).output_dir
: the directory to store evaluation outputs.surfix
: Specify when the stored embeddings are in folder doc{surfix}
or query{surfix}
, e.g., _no_compact,Please consider citing our paper if you use our benchmark or code in your work:
@inproceedings{wu24stark,
title = {STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases},
author = {
Shirley Wu and Shiyu Zhao and
Michihiro Yasunaga and Kexin Huang and
Kaidi Cao and Qian Huang and
Vassilis N. Ioannidis and Karthik Subbian and
James Zou and Jure Leskovec
},
booktitle = {NeurIPS Datasets and Benchmarks Track},
year = {2024}
}
FAQs
Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
We found that stark-qa demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.