
A javascript tool-suite to query wikidata and handle its results.
This library had for primary purpose to serve the needs of the inventaire project but extending its capabilities to other needs it totally possible: feel welcome to post your suggestions as issues or pull requests!
used APIs:
Summary
Installation
via NPM
in a terminal at your project root:
npm install wikidata-sdk --save
then in your javascript project:
var wdk = require('wikidata-sdk')
via Bower
in a terminal at your project root:
bower install wikidata-sdk --save
then, in your project, include either
/bower_components/wikidata-sdk/dist/wikidata-sdk.js
or
/bower_components/wikidata-sdk/dist/wikidata-sdk.min.js
this will create a global object named wdk
(in a browser, accessible at window.wdk
)
The Old Way
Just download the raw package from this repository or, even more lazy, include a <script src="https://raw.githubusercontent.com/maxlath/wikidata-sdk/master/dist/wikidata-sdk.min.js"></script>
in your html to get wdk from github.
In either case, this will create a global object named wdk
(in a browser, accessible at window.wdk
)
How-to
Build queries urls to
search in wikidata entities
var url = wdk.searchEntities('Ingmar Bergman')
this returns a query url that you are then free to request with the tool you like
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Ingmar%20Bergman&language=en&limit=20&format=json
or with more parameters:
var search = 'Ingmar Bergman'
var language = 'fr'
var limit = 10
var format = 'json'
var url = wdk.searchEntities(search, language, limit, format)
which can also be passed as an object:
var url = wdk.searchEntities({
search: 'Ingmar Bergman',
format: 'xml',
language: 'sv'
})
get entities by id
on the same pattern
var ids = 'Q571'
var languages = ['en', 'fr', 'de']
var properties = ['info', 'claims']
var format = 'xml'
var url = wdk.getEntities(ids, languages, properties, format)
properties being wikidata entities' properties: info, sitelinks, labels, descriptions, claims.
ids, languages, properties can get either one single value as a string or several values in a array
And Again, this can also be passed as an object:
var url = wdk.getEntities({
ids: ['Q1', 'Q5', 'Q571'],
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'xml'
})
get entities by Wikipedia titles
This can be very useful when you work with a list of Wikipedia articles in a given language and would like to move to Wikidata for all the awesomeness it provides:
var url = wdk.getWikidataIdsFromWikipediaTitles('Hamburg')
var url = wdk.getWikidataIdsFromWikipediaTitles(['Hamburg', 'Lyon', 'Berlin'])
By default, it looks in the English Wikipedia, but we can change that:
var titles = 'Hamburg'
var sites = 'dewiki'
var languages = ['en', 'fr', 'de']
var properties = ['info', 'claims']
var format = 'json'
var url = wdk.getWikidataIdsFromWikipediaTitles(titles, sites, languages, properties, format)
or using the object interface:
var url = wdk.getWikidataIdsFromWikipediaTitles({
titles: 'Hamburg',
sites: 'dewiki',
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'json'
})
get entities by other Wikimedia projects titles
This is exactly the same interface as with getWikidataIdsFromWikipediaTitles
, you just need to specify the sitelink in the form {2 letters language code}{project}
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'frwikisource')
Actually, getWikidataIdsFromWikipediaTitles
is just an alias of getWikidataIdsFromSitelinks
, so you can use it for Wikipedia too:
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'frwiki')
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'fr')
get entities reverse claims
/!\ WDQ will be deprecated, use the [SPARQL endpoint](#sparql query) instead
In wikidata API answers, you can only access claims on the entity's page, not claims pointing to this entity (what would be in the "what links here" page).
Fortunatly, you can query wikimedia awesome WDQ tool \o/
(And now also an even more awesome [SPARQL endpoint](#sparql query))
For instance, let's say you want to find all the entities that have Leo Tolstoy (Q7243) for author (P50)
var url = wdk.getReverseClaims('P50', 'Q7243')
and you can then query the obtained entities ids
request(url, function(err, response){
if (err) { dealWithError(err) }
var entities = wdk.parse.wdq.entities(response)
var url2 = wdk.getEntities(entities)
request(url2 ....
})
it also work for string values: e.g. let's say you want to find which book as 978-0-465-06710-7 for ISBN-13 (P212):
var url = wdk.getReverseClaims('P212', '978-0-465-06710-7')
### sparql queries
But now, there is even more powerful than WDQ: the all mighty Wikidata SPARQL endpoint! SPARQL can be a weird thing at first, but the Wikidata team and community really puts lots of efforts to make things easy with a user manual, an awesome tool to test you queries with autocomplete and lots of examples!
Then, to get JSON results you can make a HTTP query to https://query.wikidata.org/sparql?query={SPARQL}&format=json, which with Wdk can be done like this:
var url = wdk.sparqlQuery(SPARQL)
Exemple taken from inventaire SPARQL queries (here written using ES6 template string capabilities)
var authorId = 'Q535'
var sparql = `
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?work ?date WHERE {
?work wdt:P50 wd:${authorQid} .
OPTIONAL {
?work wdt:P577 ?date .
}
}
`
var url = wdk.sparqlQuery(sparql)
Querying this url should return a big collection of objects with work
and date
attributes corresponding to all Mr Q535's works
Results parsers
Wikidata API queries
you can pass the results from wdk.searchEntities
, wdk.getEntities
, wdk.getWikidataIdsFromWikipediaTitles
, or wdk.getWikidataIdsFromSitelinks
to wdk.parse.wd.entities
, it will return entities with simplified claims (cf "simplify claims results" hereafter)
WDQ queries
you can pass the results from wdk.getReverseClaims
to wdk.parse.wdq.entities
, it will return a list of Wikidata entities Q
ids
simplify claims results
For each entities claims, Wikidata's API returns a deep object that requires some parsing that could be avoided for simple uses.
So instead of:
"P279": [
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 340169,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$0115863d-4f02-0337-38c2-5e2bb7a0f628"
},
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 2342494,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$04c87c4e-4bce-a9ab-eb75-d9a3ed695077"
},
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 386724,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$afe3b5c3-424e-eb7b-60e6-c2ce0d122823"
}
]
we could have
"P279": [ "Q340169", "Q2342494", "Q386724" ]
you just need to pass your entity' claims object to simplifyClaims as such:
var simpleClaims = wdk.simplifyClaims(claims)
in your workflow, that could give something like:
var url = wdk.getEntities('Q535')
request(url, function(err, response){
if (err) { dealWithError(err) }
var entity = response.entities.Q535
entity.claims = wdk.simplifyClaims(entity.claims)
})
To keep things simple, "weird" values are removed (for instance, statements of datatype wikibase-item
but set to somevalues
instead of the expected Q id)
Other utils
- isNumericId
- getNumericId
- isWikidataId
- isWikidataEntityId
- isWikidataPropertyId
- normalizeId
- normalizeIds
- wikidataTimeToDateObject
- wikidataTimeToEpochTime
- wikidataTimeToISOString
- normalizeWikidataTime (aliased to wikidataTimeToEpochTime)
that's how I love to work :)
breq = require 'bluereq' # a little request lib returning bluebird-based promises
ids = ['Q647268', 'Q771376', 'Q860998', 'Q965704']
url = wdk.getEntities ids, user.language
breq.get(url)
.then wdk.parse.wd.entities
.then (entities)-> # do useful stuff with those entities data
License
MIT