scholarly
scholarly is a module that allows you to retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to solve CAPTCHAs.
Installation
scholarly
can be installed either with conda
or with pip
.
To install using conda
, simply run
conda install -c conda-forge scholarly
Alternatively, use pip
to install the latest release from pypi:
pip3 install scholarly
or pip
to install from github:
pip3 install -U git+https://github.com/scholarly-python-package/scholarly.git
We are constantly developing new features.
Please update your local package regularly.
scholarly
follows Semantic Versioning.
This means your code that uses an earlier version of scholarly
is guaranteed to work with newer versions.
Optional dependencies
-
Tor:
scholarly
comes with a handful of APIs to set up proxies to circumvent anti-bot measures.
Tor methods are deprecated since v1.5 and are not actively tested or supported.
If you wish to use Tor, install scholarly
using the tor
tag as
pip3 install scholarly[tor]
If you use zsh
(which is now the default in latest macOS), you should type this as
pip3 install scholarly'[tor]'
Note: Tor option is unavailable with conda installation.
Tests
To check if your installation is succesful, run the tests by executing the test_module.py
file as:
python3 test_module
or
python3 -m unittest -v test_module.py
Documentation
Check the documentation for a complete API reference and a quickstart guide.
Examples
from scholarly import scholarly
search_query = scholarly.search_author('Steven A Cholewiak')
first_author_result = next(search_query)
scholarly.pprint(first_author_result)
author = scholarly.fill(first_author_result )
scholarly.pprint(author)
first_publication = author['publications'][0]
first_publication_filled = scholarly.fill(first_publication)
scholarly.pprint(first_publication_filled)
publication_titles = [pub['bib']['title'] for pub in author['publications']]
print(publication_titles)
citations = [citation['bib']['title'] for citation in scholarly.citedby(first_publication_filled)]
print(citations)
IMPORTANT: Making certain types of queries, such as scholarly.citedby
or scholarly.search_pubs
, will lead to Google Scholar blocking your requests and may eventually block your IP address.
You must use proxy services to avoid this situation.
See the "Using proxies" section in the documentation for more details. Here's a short example:
from scholarly import ProxyGenerator
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)
search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')
scholarly.pprint(next(search_query))
scholarly
also has APIs that work with several premium (paid) proxy services.
scholarly
is smart enough to know which queries need proxies and which do not.
It is therefore recommended to always set up a proxy in the beginning of your application.
Disclaimer
The developers use ScraperAPI
to run the tests in Github Actions.
The developers of scholarly
are not affiliated with any of the proxy services and do not profit from them. If your favorite service is not supported, please submit an issue or even better, follow it up with a pull request.
Contributing
We welcome contributions from you.
Please create an issue, fork this repository and submit a pull request.
Read the contributing document for more information.
Acknowledging scholarly
If you have used this codebase in a scientific publication, please cite this software as following:
@software{cholewiak2021scholarly,
author = {Cholewiak, Steven A. and Ipeirotis, Panos and Silva, Victor and Kannawadi, Arun},
title = {{SCHOLARLY: Simple access to Google Scholar authors and citation using Python}},
year = {2021},
doi = {10.5281/zenodo.5764801},
license = {Unlicense},
url = {https://github.com/scholarly-python-package/scholarly},
version = {1.5.1}
}
License
The original code that this project was forked from was released by Luciano Bello under a WTFPL license. In keeping with this mentality, all code is released under the Unlicense.