DBpedia SPARQL filter does not apply to all results

六月ゝ 毕业季﹏ 提交于 2020-01-14 13:55:08

问题


A FILTER NOT EXISTS allows some results through when combined with OPTIONAL triples.

My query:

SELECT DISTINCT * WHERE 
{
  {
    ?en rdfs:label "N'Djamena"@en .
    BIND("N'Djamena" AS ?name) .
  }
  UNION {
    ?en rdfs:label "Port Vila"@en .
    BIND("Port Vila" AS ?name) .
  }
  UNION {
    ?en rdfs:label "Atafu"@en .
    BIND("Atafu" AS ?name) .
  }
  FILTER NOT EXISTS { ?en rdf:type skos:Concept } .
  OPTIONAL { ?en owl:sameAs ?es . FILTER regex(?es, "es.dbpedia") .  }
  OPTIONAL { ?en owl:sameAs ?pt . FILTER regex(?pt, "pt.dbpedia") .  }
} 
LIMIT 100

This query gets the three places as expected, but it also pulls back "Category:Atafu", which should be filtered out by virtue of having "rdf:type skos:Concept".

When used without the OPTIONAL lines, I get the three places expected. When used with those clauses non-optionally, I get only two of the countries, because Atafu doesn't have a page in Portuguese.

I can also move the FILTER NOT EXISTS statement into each of the UNION'd country blocks, but that seems to hurt the server's response time.

Why does the FILTER NOT EXISTS clause filter out "Category:N'Djamena" and Category:Port_Vila but not "Category:Atafu" when followed by OPTIONAL?


回答1:


I really have no idea why your query doesn't work. I'd have to chalk it up to some weird Virtuoso thing. There's definitely something strange going on. For instance, if you remove the bind for the last name, you'll get the resources you're expecting:

SELECT DISTINCT * WHERE 
{
  {
    ?en rdfs:label "N'Djamena"@en .
    BIND("N'Djamena" AS ?name) .
  }
  UNION {
    ?en rdfs:label "Port Vila"@en .
    BIND("Port Vila" AS ?name) .
  } 
  UNION {
    ?en rdfs:label "Atafu"@en .
  }
  FILTER NOT EXISTS { ?en rdf:type skos:Concept }
  OPTIONAL { ?en owl:sameAs ?es . FILTER regex(?es, "es.dbpedia") }
  OPTIONAL { ?en owl:sameAs ?pt . FILTER regex(?pt, "pt.dbpedia") .  }
} 
LIMIT 100

SPARQL results

It's really pretty weird. Here's a modified version of your query that gets the results you're looking for. It uses values instead of union, which makes the query simpler. It should be logically equivalent, though, so I'm not sure why it makes a difference.

select distinct * where {
  values ?label { "N'Djamena"@en "Port Vila"@en "Atafu"@en }
  ?en rdfs:label ?label .
  optional { ?en owl:sameAs ?pt . filter regex(?pt, "pt.dbpedia") }
  optional { ?en owl:sameAs ?es . filter regex(?es, "es.dbpedia") }
  filter not exists { ?en a skos:Concept }
  bind(str(?label) as ?name)
}

SPARQL results

I'd actually clean up the string matching though, since regular expressions are probably more power than you need here. You just want to check whether the value starts with a given substring:

select ?en ?label (str(?label) as ?name) ?es ?pt where {
  values ?label { "N'Djamena"@en "Port Vila"@en "Atafu"@en }
  ?en rdfs:label ?label .
  optional { ?en owl:sameAs ?pt . filter strstarts(str(?pt), "http://pt.dbpedia") }
  optional { ?en owl:sameAs ?es . filter strstarts(str(?es), "http://es.dbpedia") }
  filter not exists { ?en a skos:Concept }
}

SPARQL results



来源:https://stackoverflow.com/questions/36440591/dbpedia-sparql-filter-does-not-apply-to-all-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!