SPARQL query returns multiple birth dates for same person

核能气质少年 提交于 2019-12-24 07:25:09

问题


I am learning SPARQL and dbpedia by working through the queries in https://www.joe0.com/2014/09/22/how-to-use-sparql-to-query-dbpedia-and-freebase/ . I am testing a query to return John Lennon's date of birth and I am running my queries in http://dbpedia.org/sparql . The query is:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?x1 WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label "John Lennon"@en.
?x0 dbpedia-owl:birthDate ?x1.
}

It returns two rows containing the same date (9 Oct 1940). My question is: why does the query return two rows even though it uses DISTINCT? Prior to asking this question I checked the following:

  • Why does my SPARQL query duplicate results?
  • Duplicate rows when making SPARQL queries

but I don't think they explain the duplicate dates.

Edit: I converted the results to text and pasted them below

-------------------------------------- -----------------------------------------------------
x0                                      x1
--------------------------------------- -----------------------------------------------------
http://dbpedia.org/resource/John_Lennon 1940-10-09 
http://dbpedia.org/resource/John_Lennon "1940-10-9"^^<http://www.w3.org/2001/XMLSchema#date>

回答1:


As stated it seems dbpedia actually has two dates, 1940-10-09 (valid) and 1940-10-9 (invalid). The answer is to add a FILTER that converts the date to a string and only allows dates conforming to YYYY-MM-DD. Anyway it works!

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?x1 STR(?x1) WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label "John Lennon"@en.
?x0 dbpedia-owl:birthDate ?x1.
FILTER (REGEX(STR(?x1),"[0-9]{4}-[0-9]{2}-[0-9]{2}")).
} 



回答2:


Well, it is not your fault! Simply the resource has both of these triples as you can see here. There are duplicates in the data.




回答3:


I ran your query on the DBpedia endpoint and asked for the results in an RDF-based format (Turtle), and found that the lexical forms of the date literals are actually different:

"1940-10-09"^^xsd:date
"1940-10-9"^^xsd:date

The second isn't actually a legal xsd:date. The first is, which is probably why the SPARQL endpoint prints it in "pretty" fashion in the HTML table (as just 1940-10-09).




回答4:


The result is a slowdown on queries because each access to an invalid date trig an exception (for example, with a query from fuseki) or the filter do the job to eliminate the wrong date, but it's costly



来源:https://stackoverflow.com/questions/50060490/sparql-query-returns-multiple-birth-dates-for-same-person

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!