
A javascript tool-suite to query wikidata and handle its results.
This library had for primary purpose to serve the needs of the inventaire project but extending its capabilities to other needs it totally possible: feel welcome to post your suggestions as issues or pull requests!
used APIs:
Summary
Installation
via NPM
in a terminal at your project root:
npm install wikidata-sdk --save
then in your javascript project:
var wdk = require('wikidata-sdk')
via Bower
in a terminal at your project root:
bower install wikidata-sdk --save
then, in your project, include either
/bower_components/wikidata-sdk/dist/wikidata-sdk.js
or
/bower_components/wikidata-sdk/dist/wikidata-sdk.min.js
this will create a global object named wdk
(in a browser, accessible at window.wdk
)
The Old Way
Just download the raw package from this repository or, even more lazy, include a <script src="https://raw.githubusercontent.com/maxlath/wikidata-sdk/master/dist/wikidata-sdk.min.js"></script>
in your html to get wdk from github.
In either case, this will create a global object named wdk
(in a browser, accessible at window.wdk
)
How-to
Build queries urls to
search in wikidata entities
associated Wikidata doc: wbsearchentities
var url = wdk.searchEntities('Ingmar Bergman')
this returns a query url that you are then free to request with the tool you like
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Ingmar%20Bergman&language=en&limit=20&format=json
or with more parameters:
var search = 'Ingmar Bergman'
var language = 'fr'
var limit = 10
var format = 'json'
var url = wdk.searchEntities(search, language, limit, format)
which can also be passed as an object:
var url = wdk.searchEntities({
search: 'Ingmar Bergman',
format: 'xml',
language: 'sv'
})
By default, the uselang
parameter (the language in which the search results are returned) is set to the same as the language passed, but if for some weird use case you need to set a different language, you can still pass a 2 letters language code:
- as last argument (inline interface)
var uselang = 'eo'
var url = wdk.searchEntities(search, language, limit, format, uselang)
- or set
uselang
in the option object (object interface).
var url = wdk.searchEntities({
search: 'Ingmar Bergman',
language: 'sv',
uselang: 'eo'
})
If the values aren't available in the desired language, it will fallback to the English value if available.
get entities by id
associated Wikidata doc: wbgetentities
on the same pattern
var ids = 'Q571'
var languages = ['en', 'fr', 'de']
var properties = ['info', 'claims']
var format = 'xml'
var url = wdk.getEntities(ids, languages, properties, format)
properties being wikidata entities' properties: info, sitelinks, labels, descriptions, claims.
ids, languages, properties can get either one single value as a string or several values in a array
And Again, this can also be passed as an object:
var url = wdk.getEntities({
ids: ['Q1', 'Q5', 'Q571'],
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'xml'
})
get many entities by id
Above 50 ids, wdk.getEntities
will warn you that your request won't be fully fullfiled by Wikidata API due to its limitations policy.
You can use wdk.getManyEntities
instead to generate several request urls to work around this limitation:
The arguments API is the same as getEntities:
var urls = wdk.getEntities(['Q1', 'Q2', 'Q3', ..., 'Q123'])
var urls = wdk.getEntities(['Q1', 'Q2', 'Q3', ..., 'Q123'], ['en', 'fr', 'de'], ['info', 'claims'], 'json')
var urls = wdk.getEntities({
ids: ['Q1', 'Q2', 'Q3', ..., 'Q123'],
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'json'
})
but it returns an array of urls instead.
:warning: This limitation policy was probably there for a reason, right? This should be the exception, make sure to set an interval between your requests (500ms, 1s?), and if you really need a lot of entities, consider using dumps: there are great tools to work with those too! ;)
get entities by Wikipedia titles
associated Wikidata doc: wbgetentities
This can be very useful when you work with a list of Wikipedia articles in a given language and would like to move to Wikidata for all the awesomeness it provides:
var url = wdk.getWikidataIdsFromWikipediaTitles('Hamburg')
var url = wdk.getWikidataIdsFromWikipediaTitles(['Hamburg', 'Lyon', 'Berlin'])
By default, it looks in the English Wikipedia, but we can change that:
var titles = 'Hamburg'
var sites = 'dewiki'
var languages = ['en', 'fr', 'de']
var properties = ['info', 'claims']
var format = 'json'
var url = wdk.getWikidataIdsFromWikipediaTitles(titles, sites, languages, properties, format)
or using the object interface:
var url = wdk.getWikidataIdsFromWikipediaTitles({
titles: 'Hamburg',
sites: 'dewiki',
languages: ['en', 'fr', 'de'],
properties: ['info', 'claims'],
format: 'json'
})
get entities by other Wikimedia projects titles
associated Wikidata doc: wbgetentities
This is exactly the same interface as with getWikidataIdsFromWikipediaTitles
, you just need to specify the sitelink in the form {2 letters language code}{project}
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'frwikisource')
Actually, getWikidataIdsFromWikipediaTitles
is just an alias of getWikidataIdsFromSitelinks
, so you can use it for Wikipedia too:
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'frwiki')
var url = wdk.getWikidataIdsFromSitelinks('Victor Hugo', 'fr')
get entities reverse claims
/!\ WDQ will be deprecated, use the SPARQL endpoint instead
In wikidata API answers, you can only access claims on the entity's page, not claims pointing to this entity (what would be in the "what links here" page).
Fortunatly, you can query wikimedia awesome WDQ tool \o/
(And now also an even more awesome SPARQL endpoint)
For instance, let's say you want to find all the entities that have Leo Tolstoy (Q7243) for author (P50)
var url = wdk.getReverseClaims('P50', 'Q7243')
and you can then query the obtained entities ids
request(url, function(err, response){
if (err) { dealWithError(err) }
var entities = wdk.parse.wdq.entities(response)
var url2 = wdk.getEntities(entities)
request(url2 ....
})
it also work for string values: e.g. let's say you want to find which book as 978-0-465-06710-7 for ISBN-13 (P212):
var url = wdk.getReverseClaims('P212', '978-0-465-06710-7')
### sparql queries
But now, there is even more powerful than WDQ: the all mighty Wikidata SPARQL endpoint!
SPARQL can be a weird thing at first, but the Wikidata team and community really puts lots of efforts to make things easy with a user manual, an awesome tool to test you queries with autocomplete and lots of examples!
Then, to get JSON results you can make a HTTP query to https://query.wikidata.org/sparql?query={SPARQL}&format=json, which with Wdk can be done like this:
var url = wdk.sparqlQuery(SPARQL)
Exemple taken from inventaire SPARQL queries (here written using ES6 template string capabilities)
var authorQid = 'Q535'
var sparql = `
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?work ?date WHERE {
?work wdt:P50 wd:${authorQid} .
OPTIONAL {
?work wdt:P577 ?date .
}
}
`
var url = wdk.sparqlQuery(sparql)
Querying this url should return a big collection of objects with work
and date
attributes corresponding to all Mr Q535's works
Results parsers
Wikidata API queries
you can pass the results from wdk.searchEntities
, wdk.getEntities
, wdk.getWikidataIdsFromWikipediaTitles
, or wdk.getWikidataIdsFromSitelinks
to wdk.parse.wd.entities
, it will return entities with simplified claims (cf "simplify claims results" hereafter)
Simplify claims results
associated Wikidata doc: DataModel
For each entities claims, Wikidata's API returns a deep object that requires some parsing that could be avoided for simple uses.
So instead of:
"P279": [
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 340169,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$0115863d-4f02-0337-38c2-5e2bb7a0f628"
},
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 2342494,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$04c87c4e-4bce-a9ab-eb75-d9a3ed695077"
},
{
"rank": "normal",
"type": "statement",
"mainsnak": {
"datavalue": {
"type": "wikibase-entityid",
"value": {
"numeric-id": 386724,
"entity-type": "item"
}
},
"datatype": "wikibase-item",
"property": "P279",
"snaktype": "value"
},
"id": "Q571$afe3b5c3-424e-eb7b-60e6-c2ce0d122823"
}
]
we could have
"P279": [ "Q340169", "Q2342494", "Q386724" ]
That's what simplifyClaims
, simplifyPropertyClaims
, simplifyClaim
do, each at their own level:
simplifyClaims
you just need to pass your entity' claims object to simplifyClaims as such:
var simplifiedClaims = wdk.simplifyClaims(entity.claims)
in your workflow, that could give something like:
var url = wdk.getEntities('Q535')
request(url, function(err, response){
if (err) { dealWithError(err) }
var entity = response.entities.Q535
simplifiedClaims = wdk.simplifyClaims(entity.claims)
})
To keep things simple, "weird" values are removed (for instance, statements of datatype wikibase-item
but set to somevalues
instead of the expected Q id)
simplifyPropertyClaims
Same as simplifyClaims but expects an array of claims, typically the array of claims of a specific property:
var simplifiedP31Claims = wdk.simplifyPropertyClaims(entity.claims.P31)
simplifyClaim
Same as simplifyClaims but expects a unique claim
var simplifiedP31Claim = wdk.simplifyClaim(entity.claims.P31[0])
Wikidata Query (SPARQL) results
simplify sparql results
With SPARQL queries, you get results that look like this:
{
"head" : {
"vars" : [ "author", "authorLabel", "birth" ]
},
"results" : {
"bindings" : [ {
"author" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q3731207"
},
"authorLabel" : {
"xml:lang" : "en",
"type" : "literal",
"value" : "Ercole Patti"
},
"birth" : {
"datatype" : "http://www.w3.org/2001/XMLSchema#integer",
"type" : "literal",
"value" : "1903"
}
}
]
}
}
simplifySparqlResults
converts it to a way simpler:
[
{
"author": {
"value": "Q3731207",
"label": "Ercole Patti"
},
"birth": "1903"
}
]
That's style hairy, because we requested 3 variables, but this gets even simpler if there is only one variable!
Say instead of "vars" : [ "author", "authorLabel", "birth" ]
, we only ask for "vars" : [ "author" ]
, the output of simplifySparqlResults
will be:
["Q3731207"]
And then to make it even more simpler, we can... hum no, that's all we got.
Use it like so:
var simplifiedResults = wdk.simplifySparqlResults(results)
or for a more complete example (using promises)
var url = wdk.sparqlQuery(SPARQL)
promiseRequest(url)
.then(wdk.simplifySparqlResults)
.then((simplifiedResults) => {
WDQ queries
you can pass the results from wdk.getReverseClaims
to wdk.parse.wdq.entities
, it will return a list of Wikidata entities Q
ids
Other utils
- isNumericId
- getNumericId
- isWikidataId
- isWikidataEntityId
- isWikidataPropertyId
- normalizeId
- normalizeIds
- wikidataTimeToDateObject
- wikidataTimeToEpochTime
- wikidataTimeToISOString
- normalizeWikidataTime (aliased to wikidataTimeToEpochTime)
that's how I love to work :)
breq = require 'bluereq' # a little request lib returning bluebird-based promises
ids = ['Q647268', 'Q771376', 'Q860998', 'Q965704']
url = wdk.getEntities ids, user.language
breq.get(url)
.then wdk.parse.wd.entities
.then (entities)-> # do useful stuff with those entities data
CLI
moved to wikidata-cli
License
MIT