Exclude results from DBpedia SPARQL query based on URI prefix

只谈情不闲聊 提交于 2019-12-17 06:16:07

问题


How can I excluding a group of concepts when using the DBpedia SPARQL endpoint? I'm using the following basic query to get a list of concepts:

SELECT DISTINCT ?concept
WHERE {
    ?x a ?concept
}
LIMIT 100

SPARQL Results

This gives me a list of 100 concepts. I want to exclude all the concepts that fall into the YAGO class/group (i.e., whose IRIs begin with http://dbpedia.org/class/yago/). I can filter out individual concepts like this:

SELECT DISTINCT ?concept
WHERE {
    ?x a ?concept
    FILTER (?concept != <http://dbpedia.org/class/yago/1950sScienceFictionFilms>)
}
LIMIT 100

SPARQL Results

But what I can't seem to understand is how to exclude all YAGO sub-classes from my results? I tried using a * like this but this didn't achieve anything:

FILTER (?concept != <http://dbpedia.org/class/yago/*>)

Update:

This query with regex seems to do the trick, but it's really, really slow and ugly. I'm really looking forward to a better alternative.

SELECT DISTINCT ?type WHERE {
  [] a ?type
  FILTER( regex(str(?type), "^(?!http://dbpedia.org/class/yago/).+"))
}
ORDER BY ASC(?type)
LIMIT 10

回答1:


It might seem a little awkward, but your comment about casting to a string and doing some string-based checks is probably on the right track. You can do it a little bit more efficiently using the SPARQL 1.1 function strstarts:

SELECT DISTINCT ?concept
WHERE {
    ?x a ?concept
    FILTER ( !strstarts(str(?concept), "http://dbpedia.org/class/yago/") )
}
LIMIT 100

SPARQL Results

The other alternative would be to find a top level YAGO class, and to exclude those concepts that are rdfs:subClassOf that top level class. This would probably be a better solution in the long run (since it doesn't require casting to strings, and it's based on graph structure). Unfortunately, it doesn't look like there is a single top level YAGO class comparable to owl:Thing. I just downloaded the YAGO type hierarchy from DBpedia's download page and ran this query, which asks for classes with no superclasses, against it:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select distinct ?root where {
  [] rdfs:subClassOf ?root 
  filter not exists { ?root rdfs:subClassOf ?superRoot }
}

and I got these nine results:

----------------------------------------------------------------
| root                                                         |
================================================================
| <http://dbpedia.org/class/yago/YagoLegalActorGeo>            |
| <http://dbpedia.org/class/yago/WaterNymph109550125>          |
| <http://dbpedia.org/class/yago/PhysicalEntity100001930>      |
| <http://dbpedia.org/class/yago/Abstraction100002137>         |
| <http://dbpedia.org/class/yago/YagoIdentifier>               |
| <http://dbpedia.org/class/yago/YagoLiteral>                  |
| <http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity> |
| <http://dbpedia.org/class/yago/Thing104424418>               |
| <http://dbpedia.org/class/yago/Dryad109551040>               |
----------------------------------------------------------------

Given that the YAGO concepts aren't quite as structured as some of the others, it looks like the string based approach may be the best in this case. However, if you wanted to, you could do the a non-string-based query like this, which asks for 100 concepts, excluding those which have one of those nine results as a superclass:

select distinct ?concept where {
  [] a ?concept .
  filter not exists {
    ?concept rdfs:subClassOf* ?super .
    values ?super { 
      yago:YagoLegalActorGeo
      yago:WaterNymph109550125
      yago:PhysicalEntity100001930
      yago:Abstraction100002137
      yago:YagoIdentifier
      yago:YagoLiteral
      yago:YagoPermanentlyLocatedEntity
      yago:Thing104424418
      yago:Dryad109551040
    }
  }
}
limit 100

SPARQL Results

I'm not sure which ends up being faster. The first requires a conversion to string, and the strstarts, if implemented in a naïve fashion, has to consume http://dbpedia.org/class/ in each concept before something is a mismatch. The second requires nine comparisons that, if IRIs are interned, are just object identity checks. It's a an interesting question for further investigation.



来源:https://stackoverflow.com/questions/19044871/exclude-results-from-dbpedia-sparql-query-based-on-uri-prefix

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!