Fuzzy entity query in Wikidata with Sparql times out

后端 未结 3 2046
灰色年华
灰色年华 2021-01-07 03:41

I\'m trying to do a fuzzy (ie.. partial or case-insensitive) entity label lookup in Wikidata with Sparql (via the online endpoint). Unfortunately these return a \"Query

相关标签:
3条回答
  • 2021-01-07 03:50

    You can now use the MediaWiki API directly from SPARQL, using a Wikidata magic service as documented here.

    Example :

    SELECT * WHERE {
      SERVICE wikibase:mwapi {
          bd:serviceParam wikibase:api "EntitySearch" .
          bd:serviceParam wikibase:endpoint "www.wikidata.org" .
          bd:serviceParam mwapi:search "cheese" .
          bd:serviceParam mwapi:language "en" .
          ?item wikibase:apiOutputItem mwapi:item .
          ?num wikibase:apiOrdinal true .
      }
      ?item (wdt:P279|wdt:P31) ?type
    } ORDER BY ASC(?num) LIMIT 20
    
    0 讨论(0)
  • 2021-01-07 03:51

    You can do this online if you change your filter to use the "contains" function.

    Example:

     SELECT ?item WHERE {
                ?item rdfs:label ?label .
                FILTER( contains(lcase(?label), 'arles lin' ))
     }
     LIMIT 20
    

    Reference: contains is listed as one of the XPath functions you can use in SPARQL. See: https://www.w3.org/2009/sparql/wiki/Feature:FunctionLibrary#XQuery_1.0_and_XPath_2.0_Functions_and_Operators

    Example 2: (with more triples to optimise results)

    PREFIX skos: <http://www.w3.org/2004/02/skos/core#Concept>
    SELECT ?item  ?label WHERE {
                ?item rdfs:label ?label .
                ?item rdf:type dbo:Person   #Works with our without this too, also try skos:Category
                FILTER( contains(lcase(?label), 'arles lin' ) && LANGMATCHES(LANG(?label), "en")) 
     }
     LIMIT 20
    
    0 讨论(0)
  • 2021-01-07 04:06

    Be more specific. Triplestores work with things, not with strings. For example, the following query works fine:

    SELECT ?item WHERE {
        ?item wdt:P735 wd:Q2958359 .
        ?item rdfs:label ?label .
        FILTER (CONTAINS(LCASE(STR(?label)), "lindbergh"))
    }
    

    If it is not possible to be sufficiently specific, you need full-text search capabilities.

    • In fact, Blazegraph supports full-text search using magic bds:search predicate, but this facility is not enabled on Wikidata.
    • Additionally, Blazegraph supports external full-text search using magic fts:search predicate. The current implementation supports Apache Solr only. Perhaps it is relatively easy to support ElasticSearch, which is used in Wikidata, but anyway, this facility is not enabled.

    There is a task to provide full-text search in a form of yet another Wikidata magic service, but this functionality is still not available on the public endpoint.

    As a workaround, one can use SQL queries on Quarry. This is my query on Quarry:

    USE wikidatawiki_p; 
    DESCRIBE wb_terms;
    
    SELECT CONCAT("Q", term_entity_id) AS wikidata_id, term_language, term_text, term_search_key
    FROM wb_terms
    WHERE term_type = 'label' AND
                             term_search_key IN (LOWER('Lindbergh'), LOWER('Charles Lindbergh'));
    

    The query time limit on Quarry is 30 minutes.

    0 讨论(0)
提交回复
热议问题