Exclude results from DBpedia SPARQL query based on URI prefix

前端 未结 1 555
余生分开走
余生分开走 2020-11-28 15:07

How can I excluding a group of concepts when using the DBpedia SPARQL endpoint? I\'m using the following basic query to get a list of concepts:

SELECT DISTIN         


        
相关标签:
1条回答
  • 2020-11-28 16:07

    It might seem a little awkward, but your comment about casting to a string and doing some string-based checks is probably on the right track. You can do it a little bit more efficiently using the SPARQL 1.1 function strstarts:

    SELECT DISTINCT ?concept
    WHERE {
        ?x a ?concept
        FILTER ( !strstarts(str(?concept), "http://dbpedia.org/class/yago/") )
    }
    LIMIT 100
    

    SPARQL Results

    The other alternative would be to find a top level YAGO class, and to exclude those concepts that are rdfs:subClassOf that top level class. This would probably be a better solution in the long run (since it doesn't require casting to strings, and it's based on graph structure). Unfortunately, it doesn't look like there is a single top level YAGO class comparable to owl:Thing. I just downloaded the YAGO type hierarchy from DBpedia's download page and ran this query, which asks for classes with no superclasses, against it:

    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    select distinct ?root where {
      [] rdfs:subClassOf ?root 
      filter not exists { ?root rdfs:subClassOf ?superRoot }
    }
    

    and I got these nine results:

    ----------------------------------------------------------------
    | root                                                         |
    ================================================================
    | <http://dbpedia.org/class/yago/YagoLegalActorGeo>            |
    | <http://dbpedia.org/class/yago/WaterNymph109550125>          |
    | <http://dbpedia.org/class/yago/PhysicalEntity100001930>      |
    | <http://dbpedia.org/class/yago/Abstraction100002137>         |
    | <http://dbpedia.org/class/yago/YagoIdentifier>               |
    | <http://dbpedia.org/class/yago/YagoLiteral>                  |
    | <http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity> |
    | <http://dbpedia.org/class/yago/Thing104424418>               |
    | <http://dbpedia.org/class/yago/Dryad109551040>               |
    ----------------------------------------------------------------
    

    Given that the YAGO concepts aren't quite as structured as some of the others, it looks like the string based approach may be the best in this case. However, if you wanted to, you could do the a non-string-based query like this, which asks for 100 concepts, excluding those which have one of those nine results as a superclass:

    select distinct ?concept where {
      [] a ?concept .
      filter not exists {
        ?concept rdfs:subClassOf* ?super .
        values ?super { 
          yago:YagoLegalActorGeo
          yago:WaterNymph109550125
          yago:PhysicalEntity100001930
          yago:Abstraction100002137
          yago:YagoIdentifier
          yago:YagoLiteral
          yago:YagoPermanentlyLocatedEntity
          yago:Thing104424418
          yago:Dryad109551040
        }
      }
    }
    limit 100
    

    SPARQL Results

    I'm not sure which ends up being faster. The first requires a conversion to string, and the strstarts, if implemented in a naïve fashion, has to consume http://dbpedia.org/class/ in each concept before something is a mismatch. The second requires nine comparisons that, if IRIs are interned, are just object identity checks. It's a an interesting question for further investigation.

    0 讨论(0)
提交回复
热议问题