New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

getpapers

Package Overview
Dependencies
Maintainers
1
Versions
31
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

getpapers

Get fulltexts or fulltext URLs of papers matching a search query

  • 0.3.0
  • Source
  • npm
  • Socket score

Version published
Maintainers
1
Created
Source

getpapers

Get fulltexts or fulltext URLs of papers matching a search query using any of the following APIs:

  • EuropePMC
  • IEEE
  • ArXiv

getpapers can fetch article metadata, fulltexts (PDF or XML), and supplementary materials. It's designed for use in content mining, but you may find it useful for quickly acquiring large numbers of papers for reading, or for bibliometrics.

Installation

$ npm install --global getpapers

Usage

Use getpapers --help to see the command-line help:


Usage: getpapers [options]

Options:

  -h, --help              output usage information
  -V, --version           output the version number
  -q, --query <query>     Search query (required)
  -o, --outdir <path>     Output directory (required - will be created if not found)
  --api <name>            API to search [eupmc, ieee] (default: eupmc)
  -x, --xml               Download fulltext XMLs if available
  -p, --pdf               Download fulltext PDFs if available
  -s, --supp              Download supplementary files if available
  -l, --loglevel <level>  amount of information to log (silent, verbose, info*, data, warn, error, or debug)
  -a, --all               search all papers, not just open access

By default, getpapers uses the EuropePMC API.

Screenshot

screenshot

EuropePMC Query format

Queries are processed by EuropePMC. In their simplest form, they can be free text, like this:

--query 'brain tumour rnaseq'

But they can also be much more detailed, using the EuropePMC webservice's query language (see Appendix 1 of the EuropePMC reference PDF).

For example we can restrict our search to only papers that mention 'transcriptome assembly' in the methods:

--query 'METHODS:"transcriptome assembly"'

Or to only papers with a CC-BY license:

--query 'LICENSE:"cc by" OR LICENSE:"cc-by"'

Note that in this case, we combine two restrictions using the logical OR keyword. We can also use AND, and can group operations using brackets:

--query '(LICENSE:"cc by" OR LICENSE:"cc-by") AND METHODS:"transcriptome assembly"'

A selection of the most commonly useful search fields are explained below...

Restrict search by bibliographic metadata

FieldDescriptionExample
PMCID:Search for a publication by its PubMed Central ID, where applicable (i.e. available as full text)PMCID:PMC1287967
TITLE:Search for a term or terms in publication titlesTITLE:aspirin, TITLE:”protein knowledgebase”
ABSTRACT:Search for a term or terms in publication abstractsABSTRACT:malaria, ABSTRACT:”chicken pox”
AUTH:Search for a surname and (optionally) initial(s) in publication author listsAUTH:einstein, AUTH:”Smith AB”
JOURNAL:Journal title – searchable either in full or abbreviated formJOURNAL:”biology letters”, JOURNAL:”biol lett”
LICENSE:Search for content according to the assigned Creative Commons license (where provided).LICENSE:"cc by" OR LICENSE:"cc-by", LICENSE:cc

Restrict by article metadata

FieldDescriptionExample
DISEASE:Search for mined diseasesDISEASE:dysthymias
GENE_PROTEIN:Search for records that have GENE_PROTEINS minedGENE_PROTEIN:gng11
GOTERM:Search for records that have GOTERM minedGOTERM:apoptosis
CHEM:Limit your search by MeSH substanceCHEM:propantheline, CHEM:”protein kinases”
ORGANISM:Search for mined organismsORGANISM:terebratulide
PUB_TYPE:Limit your search by publication typePUB_TYPE:review, PUB_TYPE:”retraction of publication”
FieldDescriptionExample
INTRO:Find articles with a phrase in the Introduction & Background sectionINTRO:“protein interactions”
METHODS:Find articles with a phrase in the Materials & Methods sectionMETHODS:“yeast two-hybrid”
RESULTS:Find articles with a phrase in the Results sectionRESULTS:"in vivo"
DISCUSS:Find articles with a phrase in the Discussion secitonDISCUSS:cardivascular

IEEE query format

The IEEE query format is loosely documented at IEEE Xplore Gateway. In general, anything that works in the website search will also work in getpapers with the --api ieee option enabled.

Note that IEEE does not provide fulltext XML, and their fulltext PDFs are not easily downloadable (though we're working on it). getpapers will output metadata for the search results, and will attempt to reconstruct the fulltext HTML URLs for any papers that have fulltext HTML.

If you want to actually download the fulltext HTML of papers, you will need this hack-around script. In the future we will incorporate fulltext download into getpapers.

ArXiv query format

ArXiv has a nice, clearly defined format. Queries can target individual fields of the articles records, as follows:

prefix explanation
ti Title
au Author
abs Abstract
co Comment
jr Journal Reference
cat Subject Category
rn Report Number
id Id (use id_list instead)
all All of the above

These fields can be searched individually or combined with logical operators.

For example:

--query 'all:transcriptome'
--query 'au:"del maestro" AND ti:checkerboard'

License

Copyright (c) 2014 Shuttleworth Foundation Licensed under the MIT license

Keywords

FAQs

Package last updated on 06 Jun 2015

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc