arxiv.py

Python wrapper for the arXiv API.
arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.
Usage
Installation
$ pip install arxiv
In your Python script, include the line
import arxiv
Examples
Fetching results
import arxiv
client = arxiv.Client()
search = arxiv.Search(
query = "quantum",
max_results = 10,
sort_by = arxiv.SortCriterion.SubmittedDate
)
results = client.results(search)
for r in client.results(search):
print(r.title)
all_results = list(results)
print([r.title for r in all_results])
search = arxiv.Search(query = "au:del_maestro AND ti:checkerboard")
first_result = next(client.results(search))
print(first_result)
search_by_id = arxiv.Search(id_list=["1605.08386v1"])
first_result = next(client.results(search))
print(first_result.title)
Downloading papers
To download a PDF of the paper with ID "1605.08386v1," run a Search
and then use Result.download_pdf()
:
import arxiv
paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
paper.download_pdf()
paper.download_pdf(filename="downloaded-paper.pdf")
paper.download_pdf(dirpath="./mydir", filename="downloaded-paper.pdf")
The same interface is available for downloading .tar.gz files of the paper source:
import arxiv
paper = next(arxiv.Client().results(arxiv.Search(id_list=["1605.08386v1"])))
paper.download_source()
paper.download_source(filename="downloaded-paper.tar.gz")
paper.download_source(dirpath="./mydir", filename="downloaded-paper.tar.gz")
Fetching results with a custom client
import arxiv
big_slow_client = arxiv.Client(
page_size = 1000,
delay_seconds = 10.0,
num_retries = 5
)
for result in big_slow_client.results(arxiv.Search(query="quantum")):
print(result.title)
Logging
To inspect this package's network behavior and API logic, configure a DEBUG
-level logger.
>>> import logging, arxiv
>>> logging.basicConfig(level=logging.DEBUG)
>>> client = arxiv.Client()
>>> paper = next(client.results(arxiv.Search(id_list=["1605.08386v1"])))
INFO:arxiv.arxiv:Requesting 100 results at offset 0
INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443
DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979
Types
Client
A Client
specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.
Clients configurations specify pagination and retry logic. Reusing a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.
Search
A Search
specifies a search of arXiv's database. Use Client.results
to get a generator yielding Result
s.
Result
The Result
objects yielded by Client.results
include metadata about each paper and helper methods for downloading their content.
The meaning of the underlying raw data is documented in the arXiv API User Manual: Details of Atom Results Returned.
Result
also exposes helper methods for downloading papers: Result.download_pdf
and Result.download_source
.