I\'m trying to do a fuzzy (ie.. partial or case-insensitive) entity label lookup in Wikidata with Sparql (via the online endpoint). Unfortunately these return a \"Query
You can now use the MediaWiki API directly from SPARQL, using a Wikidata magic service as documented here.
Example :
SELECT * WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch" .
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
bd:serviceParam mwapi:search "cheese" .
bd:serviceParam mwapi:language "en" .
?item wikibase:apiOutputItem mwapi:item .
?num wikibase:apiOrdinal true .
}
?item (wdt:P279|wdt:P31) ?type
} ORDER BY ASC(?num) LIMIT 20
You can do this online if you change your filter to use the "contains
" function.
Example:
SELECT ?item WHERE {
?item rdfs:label ?label .
FILTER( contains(lcase(?label), 'arles lin' ))
}
LIMIT 20
Reference:
contains
is listed as one of the XPath functions you can use in SPARQL. See: https://www.w3.org/2009/sparql/wiki/Feature:FunctionLibrary#XQuery_1.0_and_XPath_2.0_Functions_and_Operators
Example 2: (with more triples to optimise results)
PREFIX skos: <http://www.w3.org/2004/02/skos/core#Concept>
SELECT ?item ?label WHERE {
?item rdfs:label ?label .
?item rdf:type dbo:Person #Works with our without this too, also try skos:Category
FILTER( contains(lcase(?label), 'arles lin' ) && LANGMATCHES(LANG(?label), "en"))
}
LIMIT 20
Be more specific. Triplestores work with things, not with strings. For example, the following query works fine:
SELECT ?item WHERE {
?item wdt:P735 wd:Q2958359 .
?item rdfs:label ?label .
FILTER (CONTAINS(LCASE(STR(?label)), "lindbergh"))
}
If it is not possible to be sufficiently specific, you need full-text search capabilities.
bds:search
predicate, but this facility is not enabled on Wikidata.fts:search
predicate. The current implementation supports Apache Solr only. Perhaps it is relatively easy to support ElasticSearch, which is used in Wikidata, but anyway, this facility is not enabled.There is a task to provide full-text search in a form of yet another Wikidata magic service, but this functionality is still not available on the public endpoint.
As a workaround, one can use SQL queries on Quarry. This is my query on Quarry:
USE wikidatawiki_p;
DESCRIBE wb_terms;
SELECT CONCAT("Q", term_entity_id) AS wikidata_id, term_language, term_text, term_search_key
FROM wb_terms
WHERE term_type = 'label' AND
term_search_key IN (LOWER('Lindbergh'), LOWER('Charles Lindbergh'));
The query time limit on Quarry is 30 minutes.