
A javascript tool-suite to query wikidata and handle its results.
This library had for primary purpose to serve the needs of the inventaire project but extending its capabilities to other needs it totally possible: feel welcome to post your suggestions as issues or pull requests!
used APIs:
Summary
Installation
via NPM
in a terminal at your project root:
npm install wikidata-sdk --save
then in your javascript project:
var wdk = require('wikidata-sdk')
via Bower
in a terminal at your project root:
bower install wikidata-sdk --save
then, in your project, include either
/bower_components/wikidata-sdk/dist/wikidata-sdk.js
or
/bower_components/wikidata-sdk/dist/wikidata-sdk.min.js
this will create a global object named wdk
(in a browser, accessible at window.wdk
)
The Old Way
Just download the raw package from this repository or, even more lazy, include a <script src="https://raw.githubusercontent.com/maxlath/wikidata-sdk/master/dist/wikidata-sdk.min.js"></script>
in your html to get wdk from github.
In either case, this will create a global object named wdk
(in a browser, accessible at window.wdk
)
How-to
Build queries urls to
search in wikidata entities
associated Wikidata doc: wbsearchentities
var url = wdk.searchEntities('Ingmar Bergman')
this returns a query url that you are then free to request with the tool you like
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Ingmar%20Bergman&language=en&limit=20&format=json
or with more parameters:
var search = 'Ingmar Bergman'
var language = 'fr'
var limit = 10
var format = 'json'
var url = wdk.searchEntities(search, language, limit, format)
which can also be passed as an object:
var url = wdk.searchEntities({
search: 'Ingmar Bergman',
format: 'xml',
language: 'sv'
})
By default, the uselang
parameter (the language in which the search results are returned) is set to the same as the language passed, but if for some weird use case you need to set a different language, you can still pass a 2 letters language code:
- as last argument (inline interface)
var uselang = 'eo'
var url = wdk.searchEntities(search, language, limit, format, uselang)
- or set
uselang
in the option object (object interface).
var url = wdk.searchEntities({
search: 'Ingmar Bergman',
language: 'sv',
uselang: 'eo'
})
If the values aren't available in the desired language, it will fallback to the English value if available.
get entities by id
associated Wikidata doc: wbgetentities
on the same pattern
var ids = 'Q571'
var languages = ['en', 'fr', 'de']
var properties = ['info', 'claims']
var format = 'xml'
var url = wdk.getEntities(ids, languages, properties, format)
properties being wikidata entities' properties: info, sitelinks, labels, descriptions, claims.
ids, languages, properties can get either one single value as a string or several values in a array
And Again, this can also be passed as an object:
var url = wdk.getEntities({
ids: ['Q1', 'Q5', 'Q571'],
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'xml'
})
get entities by Wikipedia titles
associated Wikidata doc: wbgetentities
This can be very useful when you work with a list of Wikipedia articles in a given language and would like to move to Wikidata for all the awesomeness it provides:
var url = wdk.getWikidataIdsFromWikipediaTitles('Hamburg')
var url = wdk.getWikidataIdsFromWikipediaTitles(['Hamburg', 'Lyon', 'Berlin'])
By default, it looks in the English Wikipedia, but we can change that:
var titles = 'Hamburg'
var sites = 'dewiki'
var languages = ['en', 'fr', 'de']
var properties = ['info', 'claims']
var format = 'json'
var url = wdk.getWikidataIdsFromWikipediaTitles(titles, sites, languages, properties, format)
or using the object interface:
var url = wdk.getWikidataIdsFromWikipediaTitles({
titles: 'Hamburg',
sites: 'dewiki',
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'json'
})
get entities by other Wikimedia projects titles
associated Wikidata doc: wbgetentities
This is exactly the same interface as with getWikidataIdsFromWikipediaTitles
, you just need to specify the sitelink in the form {2 letters language code}{project}
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'frwikisource')
Actually, getWikidataIdsFromWikipediaTitles
is just an alias of getWikidataIdsFromSitelinks
, so you can use it for Wikipedia too:
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'frwiki')
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'fr')
get entities reverse claims
/!\ WDQ will be deprecated, use the SPARQL endpoint instead
In wikidata API answers, you can only access claims on the entity's page, not claims pointing to this entity (what would be in the "what links here" page).
Fortunatly, you can query wikimedia awesome WDQ tool \o/
(And now also an even more awesome SPARQL endpoint)
For instance, let's say you want to find all the entities that have Leo Tolstoy (Q7243) for author (P50)
var url = wdk.getReverseClaims('P50', 'Q7243')
and you can then query the obtained entities ids
request(url, function(err, response){
if (err) { dealWithError(err) }
var entities = wdk.parse.wdq.entities(response)
var url2 = wdk.getEntities(entities)
request(url2 ....
})
it also work for string values: e.g. let's say you want to find which book as 978-0-465-06710-7 for ISBN-13 (P212):
var url = wdk.getReverseClaims('P212', '978-0-465-06710-7')
### sparql queries
But now, there is even more powerful than WDQ: the all mighty Wikidata SPARQL endpoint!
SPARQL can be a weird thing at first, but the Wikidata team and community really puts lots of efforts to make things easy with a user manual, an awesome tool to test you queries with autocomplete and lots of examples!
Then, to get JSON results you can make a HTTP query to https://query.wikidata.org/sparql?query={SPARQL}&format=json, which with Wdk can be done like this:
var url = wdk.sparqlQuery(SPARQL)
Exemple taken from inventaire SPARQL queries (here written using ES6 template string capabilities)
var authorQid = 'Q535'
var sparql = `
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?work ?date WHERE {
?work wdt:P50 wd:${authorQid} .
OPTIONAL {
?work wdt:P577 ?date .
}
}
`
var url = wdk.sparqlQuery(sparql)
Querying this url should return a big collection of objects with work
and date
attributes corresponding to all Mr Q535's works
Results parsers
Wikidata API queries
you can pass the results from wdk.searchEntities
, wdk.getEntities
, wdk.getWikidataIdsFromWikipediaTitles
, or wdk.getWikidataIdsFromSitelinks
to wdk.parse.wd.entities
, it will return entities with simplified claims (cf "simplify claims results" hereafter)
WDQ queries
you can pass the results from wdk.getReverseClaims
to wdk.parse.wdq.entities
, it will return a list of Wikidata entities Q
ids
Simplify claims results
associated Wikidata doc: DataModel
For each entities claims, Wikidata's API returns a deep object that requires some parsing that could be avoided for simple uses.
So instead of:
"P279": [
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 340169,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$0115863d-4f02-0337-38c2-5e2bb7a0f628"
},
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 2342494,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$04c87c4e-4bce-a9ab-eb75-d9a3ed695077"
},
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 386724,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$afe3b5c3-424e-eb7b-60e6-c2ce0d122823"
}
]
we could have
"P279": [ "Q340169", "Q2342494", "Q386724" ]
That's what simplifyClaims
, simplifyPropertyClaims
, simplifyClaim
do, each at their own level:
simplifyClaims
you just need to pass your entity' claims object to simplifyClaims as such:
var simplifiedClaims = wdk.simplifyClaims(entity.claims)
in your workflow, that could give something like:
var url = wdk.getEntities('Q535')
request(url, function(err, response){
if (err) { dealWithError(err) }
var entity = response.entities.Q535
simplifiedClaims = wdk.simplifyClaims(entity.claims)
})
To keep things simple, "weird" values are removed (for instance, statements of datatype wikibase-item
but set to somevalues
instead of the expected Q id)
simplifyPropertyClaims
Same as simplifyClaims but expects an array of claims, typically the array of claims of a specific property:
var simplifiedP31Claims = wdk.simplifyPropertyClaims(entity.claims.P31)
simplifyClaim
Same as simplifyClaims but expects a unique claim
var simplifiedP31Claim = wdk.simplifyClaim(entity.claims.P31[0])
Other utils
- isNumericId
- getNumericId
- isWikidataId
- isWikidataEntityId
- isWikidataPropertyId
- normalizeId
- normalizeIds
- wikidataTimeToDateObject
- wikidataTimeToEpochTime
- wikidataTimeToISOString
- normalizeWikidataTime (aliased to wikidataTimeToEpochTime)
that's how I love to work :)
breq = require 'bluereq' # a little request lib returning bluebird-based promises
ids = ['Q647268', 'Q771376', 'Q860998', 'Q965704']
url = wdk.getEntities ids, user.language
breq.get(url)
.then wdk.parse.wd.entities
.then (entities)-> # do useful stuff with those entities data
CLI
Now some sweeties from the command line!
Executables are regrouped in the bin
folder, so you can execute them using there file path (ex: ./bin/qlabel
), but it is way more convenient to have them globally accessible (ex: qlabel
), and for that, wikidata-sdk
must be installed globally:
npm install -g wikidata-sdk
qlabel
Working with Wikidata, we often end up with obscure ids. We can always look-up those ids labels on the website but that means loading pages and pages, when a small API call and parsing could return just what we need: a label
qlabel Q1103345
By default, the result is in English, but we can pass a 2-letters language code as second argument
qlabel Q1103345 de
wikiqid
This one is kind of the other way around: pass it the title of a Wikipedia article and it will return the corresponding Wikidata id
wikiqid Cantabria
wikiqid New Delhi
By default, it will look at the English Wikipedia, but you can specify another language by passing a 2-letters language code as last argument
wikiqid science politique fr
License
MIT