
Security News
Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
ir-datasets-longeval
Advanced tools
Extension for accessing the LongEval datasets via ir_datasets.
Install the package from PyPI:
pip install ir-datasets-longeval
The ir_datasets_longeval
extension provides an load
method that returns a LongEval ir_dataset
that allows to load official versions of the LongEval datasets as well as modified versions that you have on your local filesystem:
from ir_datasets_longeval import load
# load an official version of the LongEval dataset.
dataset = load("longeval-web/2022-06")
# load a local copy of a LongEval dataset.
# E.g., so that you can easily run your approach on modified data.
dataset = load("<PATH-TO-A-DIRECTORY-ON-YOUR-MACHINE>")
# From now on, you can use dataset as any ir_dataset
LongEval datasets have a set of temporal specifics that you can use:
# At what time does/did a dataset take place?
dataset.get_timestamp()
# Each dataset can have a list of zero or more past datasets/interactions.
# You can incorporate them in your retrieval system:
for past_dataset in dataset.get_prior_datasets():
# `past_dataset` is an LongEval `ir_dataset` with the same functionality as the `dataset`
past_dataset.get_timestamp()
If you want to use the CLI, just use the ir_datasets_longeval
instead of ir_datasets
. All CLI commands will work as usual, e.g., to list the officially available datasets:
ir_datasets_longeval list
To build this package and contribute to its development you need to install the build
, setuptools
, and wheel
packages (pre-installed on most systems):
pip install build setuptools wheel
Create and activate a virtual environment:
python3.10 -m venv venv/
source venv/bin/activate
Install the package and test dependencies:
pip install -e .[tests]
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Wheels for this package can be built with:
python -m build
If you have any problems using this package, please file an issue. We're happy to help!
This repository is a fork of ir-datasets-clueweb22, originally developed by Jan Heinrich Merker. All credit for the original work goes to him, and this fork retains the original MIT License. The changes made in this fork include an adaptation from the clueweb22 dataset to the LongEval datasets.
This repository is released under the MIT license.
FAQs
Extension for accessing the LongEval test collections via ir_datasets.
We found that ir-datasets-longeval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
Research
Security News
Malicious Koishi plugin silently exfiltrates messages with hex strings to a hardcoded QQ account, exposing secrets in chatbots across platforms.
Research
Security News
Malicious PyPI checkers validate stolen emails against TikTok and Instagram APIs, enabling targeted account attacks and dark web credential sales.