
Company News
Socket Has Acquired Secure Annex
Socket has acquired Secure Annex to expand extension security across browsers, IDEs, and AI tools.
arxiv-retriever
Advanced tools
Note: This project is currently in maintenance mode. While I am not actively developing new features, I will continue to address critical issues and security vulnerabilities as time permits. Users are welcome to fork the repository if they wish to extend its functionality. Please refer to Maintenance Policy for more information.
arxiv_retriever is a lightweight command-line tool designed to automate the retrieval, downloading, and
summarization of research papers from ArXiv. The retrieval can be done using specified ArXiv
categories, full or partial titles of papers, or links to the papers. Paper retrieval can be refined by author.
Papers can be summarized using multiple LLM providers — Ollama (local, default), Claude (Anthropic), or Gemini (Google) — directly from the terminal.
NOTE: My tests indicate that when searching for a really long title, using the partial title and then refining by author yields better results, as opposed to searching with the full title or even searching with the full title and refining by author. However, the tests are not exhaustive.
This tool is built using Python and leverages the Typer library for the command-line interface, Rich for enhanced terminal output, and the Python ElementTree XML package for parsing XML responses from the arXiv API. It can be useful for researchers, engineers, or students who want to quickly retrieve an ArXiv paper or keep abreast of latest research in their field without leaving their terminal/workstation.
Although my current focus while building arxiv_retriever is the computer science archive, it can be easily
used with categories from other areas on arxiv, e.g., math.CO.
--model claude)Environment variables are used to configure LLM providers for the paper summarization feature. Ollama is the default provider and requires no API keys (it runs locally).
| Variable | Provider | Required | Default |
|---|---|---|---|
ANTHROPIC_API_KEY | Claude | Yes (for Claude) | — |
GEMINI_API_KEY | Gemini | Yes (for Gemini) | — |
OLLAMA_BASE_URL | Ollama | No | http://localhost:11434 |
ARXIV_RETRIEVER_DEFAULT_MODEL | All | No | ollama:llama3 |
In your terminal, run:
export ANTHROPIC_API_KEY=<your-anthropic-key>
export GEMINI_API_KEY=<your-gemini-key>
To ensure this works across all shell instances, add the above lines to your shell configuration file
(e.g., ~/.bashrc, ~/.zshrc, or ~/.profile).
NOTE: Keep your API keys confidential and do not share them publicly.
pip install --upgrade arxiv-retriever
If you need a specific version or want to install from a source distribution:
Download the source distribution (.tar.gz file) from PyPI or the GitHub releases page.
Install using pip:
pip install axiv-x.y.z.tar.gz
Replace x.y.z with the version number.
This method can be useful if you need a specific version or are in an environment without direct access to PyPI.
To install the latest development version from source:
git clone https://github.com/MimicTester1307/arxiv_retriever.git
cd arxiv_retriever
uv sync
uv run pytest
uv run axiv --help
After installation, use the package via the axiv command. To view available commands: axiv --help or axiv
arxiv_retriever. This is the name you use when installing via pip or referring to the project.axiv command in your terminal.This distinction allows for a more concise command while maintaining a descriptive package name.
fetch: Fetch papers from ArXiv based on categories, refined by options.search: Search for papers on ArXiv using title, refined by options.download: Download papers from ArXiv using their links (PDF or abstract links).summarize: Summarize one or more PDF papers using an LLM provider.version: Display version information for arxiv_retriever and core dependencies.To retrieve the most recent computer science papers by categories, use the fetch command followed by the categories and
options:
axiv fetch [OPTIONS] CATEGORIES...
To search for a paper by title, use the search command followed by the title and options:
axiv search [OPTIONS] TITLE
Due to how most CLI frameworks (including Typer) handle arguments vs options, if you want to specify multiple options (in this case, authors)
to refine your search or fetch command by, you will have to call the option multiple times. That is,
--author <author> --author <author> as opposed to --author <author> <author>. Alternatively, you can use -a rather
than --author
There are multiple ways to download your research paper using axiv:
axiv download [OPTIONS] LINKS... to download the paper directly from the linkfetch or search when asked by the CLIWith option 1, the file is named using the URL's basename, e.g. 2407.09298v1.pdf.
With options 2, the file is named using the title retrieved from the XML data when parsing.
NOTE: If the file name exists, it is overwritten.
Fetch the latest 5 papers in the cs.AI OR cs.GL categories:
axiv fetch cs.AI cs.GL --limit 5
Outputs limit papers sorted by submittedDate in descending order, filtered by authors
Refine fetch using multiple authors
axiv fetch cs.AI -a omar -a matei
Add logic for creating query when multiple authors are supplied using --author-logic or -l:
axiv fetch cs.AI math.CO -a "John Doe" -a "Jane Smith" --author-logic AND
Fetch papers matching the title, "Attention is all you need", refined by author "Ashish":
axiv search "Attention is all you need" --limit 5 --author "Ashish"
Download papers using links:
axiv download https://arxiv.org/abs/2407.20214v1
axiv download https://arxiv.org/pdf/2407.20214v1
Summarize downloaded PDF papers using an LLM:
# Summarize a single paper (uses Ollama by default — local, no rate limits)
axiv summarize paper.pdf
# Summarize multiple papers
axiv summarize paper1.pdf paper2.pdf
# Summarize all PDFs in a directory
axiv summarize ./arxiv_downloads/
# Use a specific provider
axiv summarize paper.pdf --model claude
axiv summarize paper.pdf --model gemini
# Use a specific model
axiv summarize paper.pdf --model claude:claude-sonnet-4-6
# Save summaries to JSON
axiv summarize ./arxiv_downloads/ --save
arxiv_retriever supports multiple LLM providers for paper summarization. Ollama is the default — it runs
locally and has no API rate limits or costs.
Use --model provider:model_name or just --model provider (uses the default model for that provider):
| Provider | Default Model | Shorthand | Requires |
|---|---|---|---|
| Ollama | llama3 | --model ollama | Ollama running locally |
| Claude | claude-sonnet-4-6 | --model claude | ANTHROPIC_API_KEY env var |
| Gemini | gemini-3-flash-preview | --model gemini | GEMINI_API_KEY env var |
# Use default (Ollama)
axiv summarize paper.pdf
# Use Claude with default model
axiv summarize paper.pdf --model claude
# Use Gemini with a specific model
axiv summarize paper.pdf --model gemini:gemini-2.0-flash
# Set a custom default model via environment variable
export ARXIV_RETRIEVER_DEFAULT_MODEL=claude:claude-sonnet-4-6
axiv summarize paper.pdf
Contributions are welcome! Please fork the repository and submit a pull request for any features, bug fixes, or enhancements.
Currently, all 35 tests pass. Refactoring the tests for asynchrony was an interesting challenge. Discussions and contributions regarding the asynchronous implementation are particularly welcome.
uv run pytest
Contact me via email or leave a comment on the Notion project tracker.
This project is currently in maintenance mode. Here is what you can expect:
For any questions, concerns, or comments, please open an issue in the GitHub repository.
This project is licensed under the MIT license. See the LICENSE file for more details.
FAQs
"Automate your ArXiv paper search, retrieval, and summarization process."
We found that arxiv-retriever demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket has acquired Secure Annex to expand extension security across browsers, IDEs, and AI tools.

Research
/Security News
Socket is tracking cloned Open VSX extensions tied to GlassWorm, with several updated from benign-looking sleepers into malware delivery vehicles.

Product
Reachability analysis for PHP is now available in experimental, helping teams identify which vulnerabilities are actually exploitable.