🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

arxivabscraper

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

arxivabscraper

Get arXiv.org abstracts within a date range and category

0.3
PyPI
Maintainers
1

arxivabscraper

An ArXiV scraper to retrieve abstracts from given categories and date range.

Install

Use pip (or pip3 for python3):

$ pip install arxivabscraper

or download the source and use setup.py:

$ python setup.py install

or if you do not want to install the module, copy arxivabscraper.py into your working directory.

To update the module using pip:

pip install arxivabscraper --upgrade

Examples

You can directly use arxivabscraper in your scripts. Let's import arxivabscraper and create a scraper to fetch all preprints in condensed matter physics category from 2 May 2018 until 2 June 2020 (for other categories, see below):

import arxivabscraper
scraper = arxivabscraper.Scraper(category='physics:cond-mat', date_from='2018-05-02',date_until='2020-06-02')

Once we built an instance of the scraper, we can start the scraping:

output = scraper.scrape()

While scraper is running, it prints its status:

fetching up to  1000 records...
fetching up to  2000 records...
Got 503. Retrying after 30 seconds.
fetching up to  3000 records...
fetching is complete.

Finally you can save the output in your favorite format or readily convert it into a pandas dataframe:

import pandas as pd
cols = ('categories', 'abstract')
df = pd.DataFrame(output,columns=cols)

Categories

Here is a list of all categories available on ArXiv.

CategoryCode
Computer Sciencecs
Economicsecon
Electrical Engineering and Systems Scienceeess
Mathematicsmath
Physicsphysics
Astrophysicsphysics:astro-ph
Condensed Matterphysics:cond-mat
General Relativity and Quantum Cosmologyphysics:gr-qc
High Energy Physics - Experimentphysics:hep-ex
High Energy Physics - Latticephysics:hep-lat
High Energy Physics - Phenomenologyphysics:hep-ph
High Energy Physics - Theoryphysics:hep-th
Mathematical Physicsphysics:math-ph
Nonlinear Sciencesphysics:nlin
Nuclear Experimentphysics:nucl-ex
Nuclear Theoryphysics:nucl-th
Physics (Other)physics:physics
Quantum Physicsphysics:quant-ph
Quantitative Biologyq-bio
Quantitative Financeq-fin
Statisticsstat

Contributing

Ideas/bugs/comments? Please open an issue or submit a pull request on Github.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This work is based on the arxivscraper from Mahdi Sadjadi (2017). arxivscraper: Zenodo. http://doi.org/10.5281/zenodo.889853

Keywords

arxiv

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts