Security News
The Dark Side of Open Source
At Node Congress, Socket CEO Feross Aboukhadijeh uncovers the darker aspects of open source, where applications that rely heavily on third-party dependencies can be exploited in supply chain attacks.
vsm-dictionary-pubmed
Advanced tools
Readme
vsm-dictionary-pubmed
is an implementation
of the 'VsmDictionary' parent-class/interface (from the package
vsm-dictionary
), that uses
NCBI's Programming Utilities (E-utilities) API
to interact with Entrez's
PubMed MEDLINE database and retrieve bibliographic information for articles from
the biomedical literature.
Note that PubMed is actually a search engine that is used to access biomedical literature not only from MEDLINE, but also from other life science journals and online books. So, even though PubMed is not the actual database that holds the literature data, it's commonly referred to as such and that's why we named this vsm-dictionary after it.
Run: npm install
Create a directory test-dir
and inside run npm install vsm-dictionary-pubmed
.
Then, create a test.js
file and include this code for example:
const DictionaryPubMed = require('vsm-dictionary-pubmed');
const dict = new DictionaryPubMed({ log: true, apiKey: ''});
dict.getEntryMatchesForString('logical modeling', { page: 1, perPage: 10 },
(err, res) => {
if (err)
console.log(JSON.stringify(err, null, 4));
else
console.log(JSON.stringify(res, null, 4));
}
);
Then, run node test.js
Note that by using no API key (as in the example above - empty string or absent apiKey
property)
the upper limit of requests/sec to NCBI's Entrez system is 3.
A registered NCBI user can request for an API key, which will increase this
limit to 10 requests/sec (see blog post).
This limit is very important because the vsm-autocomplete module
that uses a vsm-dictionary as input, sends many such requests/sec since when
someone types a string in the input-field component, it uses the getEntryMatchesForString
function of the underlying vsm-dictionary (and typing fast for example can trigger
many such calls). When the requests exceed the aforementioned
limit in each case, an error object is returned from the Entrez servers (HTTP 429).
In order to account for this limit, we have implemented a rate limiter function that
accumulates in a queue the requests to NCBI's servers (see below the specification
for getEntries
and getEntryMatchesForString
to see the exact URL requests) and
sends only one request per 200 ms - thus ensuring that we will never receive
back that error when using a proper API key.
<script src="https://unpkg.com/vsm-dictionary-pubmed@^1.0.0/dist/vsm-dictionary-pubmed.min.js"></script>
after which it is accessible as the global variable VsmDictionaryPubMed
.
Run npm test
, which runs the source code tests with Mocha.
If you want to quickly live test the E-utilities API, go to the
test
directory and run:
node getEntries.test.js
node getEntryMatchesForString.test.js
To use a VsmDictionary in Node.js, one can simply run npm install
and then
use require()
. But it is also convenient to have a version of the code that
can just be loaded via a <script>-tag in the browser.
Therefore, we included webpack.config.js
, which is a Webpack configuration file for
generating such a browser-ready package.
By running npm build
, the built file will appear in a 'dist' subfolder.
You can use it by including:
<script src="../dist/vsm-dictionary-pubmed.min.js"></script>
in the
header of an HTML file.
Like all VsmDictionary subclass implementations, this package follows the parent class specification. In the next sections we will explain the mapping between the data offered by two of Entrez's E-utilities (esearch and esummary) and the corresponding VSM objects. Find the documentation for the API here: https://dataguide.nlm.nih.gov/eutilities/utilities.html.
Note that in the next functions, whenever we sent requests to NCBI's servers and receive an error response that is not a valid JSON string that we can parse, we formulate the error as a JSON object ourselves in the following format:
{
status: <number>,
error: <response>
}
where the response from the server is JSON stringified.
This specification relates to the function:
getDictInfos(options, cb)
If the options.filter.id
is not properly defined
or the https://www.ncbi.nlm.nih.gov/pubmed
dictID is included in the
list of ids used for filtering, getDictInfos
returns a static object
with the following properties:
id
: 'https://www.ncbi.nlm.nih.gov/pubmed' (will be used as a dictID
)abbrev
: 'PubMed'name
: 'PubMed'Otherwise, an empty result is returned.
This specification relates to the function:
getEntries(options, cb)
Firstly, if the options.filter.dictID
is properly defined and in the list of
dictIDs the https://www.ncbi.nlm.nih.gov/pubmed
dictID is not included, then
an empty array of entry objects is returned.
If the options.filter.id
is properly defined (with IDs like
https://www.ncbi.nlm.nih.gov/pubmed/12345
) then we use a query like this:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1,10,20,2&retmode=json&api_key=xyz
For the above URL, we provide a brief description for each sub-part:
https://www.ncbi.nlm.nih.gov/pubmed/12345
.apiKey
given to the DictionaryPubMed
constructor.Otherwise, we get an error object back since the API does not support the retrieval of all PubMed ids information (paginated):
{
error: 'Not implemented'
}
When using the E-utilities esummary API, we get back a JSON object with a result property whose value is the object of returned results. This object has as keys the PMIDs and values objects which include the information for each PMID (the summaries so to say). We now provide a mapping of each PMID's information object properties to VSM-entry specific properties:
PMID field | Type | Required | VSM entry/match object property | Notes |
---|---|---|---|---|
Object.keys(result) | Array | YES | id | The VSM entry id is the full URI, not just the PMID |
Object.keys(result) | Array | YES | str , terms[i].str | The main term is 'PMID:<PMID>' |
result[PMID].authors[0].name, result[PMID].source, result[PMID].pubdate, result[PMID].title | Strings | NO | descr | The descr form is: {main author's name} ({Journal} {publication year}), {title} |
result[PMID].articleids | Array | YES | z.articleIDs | We map the whole array |
Note that the whole point of the above mapping is to have a good enough descr
string, so that a user (curator) will be able to distinguish an entry article
from the others (the PMID is enough for the computer, but not for humans).
After mapping the results to VSM objects, we sort them based on the PMID value
and then prune them according to the values options.page
(default: 1) and
options.perPage
(default: 50).
This specification relates to the function:
getEntryMatchesForString(str, options, cb)
Firstly, if the options.filter.dictID
is properly defined and in the list of
dictIDs the https://www.ncbi.nlm.nih.gov/pubmed
dictID is not included, then
an empty array of match objects is returned.
Otherwise, we use two URLs: one to get the relevant PMIDs that match the
requested string term (using the esearch endpoint) and one like in the getEntries
case, to get the article summaries matching the previously-found PMIDs (using the esummary endpoint).
An example of these two queries, when searching for logical modeling
as str
,
would be:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=logical%20modeling&retmax=3&retstart=0&sort=most+recent&retmode=json
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=31515732,31407132,31347261&retmode=json
For the second URL, concerning the esummary endpoint, a description of each sub-part was given in the section above. For the first URL, concerning the esearch endpoing, we now provide a brief description for each sub-part:
retmax
and retstart
parameters define which and how many results will
be included in the returned result. They depend on the options.page
and
options.perPage
options. Default values are 50 and 0 respectively.sort
parameter defines the returned order of the the PMIDs. The default
value is most recent
. Other acceptable values are:
journal
pub+date
relevance
title
author
const dict = new DictionaryPubMed({ sort: 'relevance' });
esummary
case.The first URL returns an object (let's call it res
) and we get the PMIDs
associated with the searched term str
as an array of strings (the value of the
res.esearchresult.idlist
). We then use the returned PMIDs to fill in the second
URL and get back the respective article summaries which we map to VSM-match
objects as shown in the table above for the getEntries(options, cb)
case.
Note that the most efficient way to get back a specific article is to
search using a string str
that matches the PMID or the PMC or the DOI
number of that article. For example any of the following str
will return one
result (VSM-match object corresponding to the article):
7717779
PMID:7717779
Pmid: 7717779
pmiD: 7717779
(note that the PMID keyword is case-insensitive)PMC1234567
10.1097/00000658-199503000-00007
(not DOI: <doi string>
)This project is licensed under the AGPL license - see LICENSE.md.
FAQs
Implementation of a VSM-dictionary that uses NCBI's E-utilities API to retrieve records from the biomedical literature database PubMed
The npm package vsm-dictionary-pubmed receives a total of 9 weekly downloads. As such, vsm-dictionary-pubmed popularity was classified as not popular.
We found that vsm-dictionary-pubmed demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
At Node Congress, Socket CEO Feross Aboukhadijeh uncovers the darker aspects of open source, where applications that rely heavily on third-party dependencies can be exploited in supply chain attacks.
Research
Security News
The Socket Research team found this npm package includes code for collecting sensitive developer information, including your operating system username, Git username, and Git email.
Security News
OpenJS is warning of social engineering takeovers targeting open source projects after receiving a credible attempt on the foundation.