DBpedia SPARQL query returns multiple and duplicate records

前端 未结 1 707
轻奢々
轻奢々 2021-01-15 18:17

I am quite new to SPARQL and also becoming confused by the manifold syntax standards existing for it. I am struggling to fetch unique data from DBpedia using the following q

相关标签:
1条回答
  • 2021-01-15 19:05

    Let's look at one case first. In the case of the Geffrye the duplicate results occur because multiple longitudes are present in the data, as the following query demonstrates:

    SELECT ?museum ?latitude ?longitude
    WHERE { 
      VALUES ?museum { dbpedia:Geffrye_Museum }
      ?museum a dbpedia-owl:Museum ; 
              geo:lat ?latitude ;  
              geo:long ?longitude .
    }
    GROUP BY ?museum ?latitude ?longitude
    

    SPARQL results

    which produces

    museum                                     latitude longitude
    http://dbpedia.org/resource/Geffrye_Museum 51.5317  -0.07663
    http://dbpedia.org/resource/Geffrye_Museum 51.5317  -0.0762194
    

    Fortunately, this is easy enough to remedy. As discussed in this question you can group the results by their characteristic values, and then sample, minimize, maximize, etc., over the values to get precisely what you want. For instance, if you want the greatest valued longitude, you can use MAX(?longtude) as ?longitude in your SELECT, as in the following query, which produces a single value.

    SELECT ?museum ?latitude (MAX(?longitude) as ?longitude)
    WHERE { 
      VALUES ?museum { dbpedia:Geffrye_Museum }
      ?museum a dbpedia-owl:Museum ; 
              geo:lat ?latitude ;  
              geo:long ?longitude .
    }
    GROUP BY ?museum ?latitude
    

    SPARQL results

    Of course, it presumes a bit of knowledge to group by ?latitude and to maximize over ?longitude. It's probably a better idea to just group by ?museum and use aggregate projection to pull out the other values, as in:

    SELECT ?museum (MAX(?latitude) as ?latitude) (MAX(?longitude) as ?longitude)
    WHERE { 
      VALUES ?museum { dbpedia:Geffrye_Museum }
      ?museum a dbpedia-owl:Museum ; 
              geo:lat ?latitude ;  
              geo:long ?longitude .
    }
    GROUP BY ?museum
    

    SPARQL results

    Taking this approach to all the variables produces something like this:

    SELECT DISTINCT ?Museum
      (SAMPLE(?name) as ?name)
      (SAMPLE(?abstract) as ?abstract)
      (SAMPLE(?thumbnail) as ?thumbnail)
      (MAX(?latitude) as ?latitude)
      (MAX(?longitude) as ?longitude)
      (SAMPLE(?photoCollection) as ?photoCollection)
      (SAMPLE(?website) as ?website)
      (SAMPLE(?homepage) as ?homepage)
      (SAMPLE(?wikilink) as ?wikilink)
    WHERE { 
      ?Museum a dbpedia-owl:Museum ; 
              dbpprop:name ?name ; 
              dbpedia-owl:abstract ?abstract ; 
              dbpedia-owl:thumbnail ?thumbnail ; 
              geo:lat ?latitude ;  
              geo:long ?longitude ; 
              dbpprop:hasPhotoCollection ?photoCollection ;
              dbpprop:website ?website ; 
              foaf:homepage ?homepage ; 
              foaf:isPrimaryTopicOf ?wikilink .
      FILTER(langMatches(lang(?abstract),"EN")) 
      FILTER (langMatches(lang(?name),"EN"))
    }
    GROUP BY ?Museum
    LIMIT 20
    

    SPARQL results

    It might seem a bit awkward to have to use the aggregate projection on all your variables, but it will work. However, you can also do the aggregation in a subquery first, and that will clean the variable projections up, at the cost of a subquery. (The subquery doesn't necessarily have a negative impact on the query; in fact it could be the opposite. The query itself is a bit harder to read, though.)

    SELECT * WHERE { 
      # Select museums and a single latitude and longitude for them.
      {
        SELECT ?Museum (MAX(?longitude) as ?longitude) (MAX(?latitude) as ?latitude) WHERE {
          ?Museum a dbpedia-owl:Museum ;
                  geo:lat ?latitude ;
                  geo:long ?longitude .
        }
        GROUP BY ?Museum
      }
      # Get the rest of the properties of the museum.
      ?Museum dbpprop:name ?name ;
              dbpedia-owl:abstract ?abstract ; 
              dbpedia-owl:thumbnail ?thumbnail ; 
              dbpprop:hasPhotoCollection ?photoCollection ;
              dbpprop:website ?website ; 
              foaf:homepage ?homepage ; 
              foaf:isPrimaryTopicOf ?wikilink .
      FILTER(langMatches(lang(?abstract),"EN")) 
      FILTER (langMatches(lang(?name),"EN"))
    }
    GROUP BY ?Museum
    LIMIT 20
    

    SPARQL results

    Finally, since you need to normalize over names as well as geographic coordinates, your final query would be something like the following. In your question, you only said that you wanted to keep the “first result,” but there's no particular order imposed on the results, so there is no unique “first result.” With the data at hand, you can use (MIN(?name) as ?name) and you'll get the name you wanted for the Institute for Museum Research, but if you have a particular constraint in mind, you'll need to figure out how to make that more specific.

    SELECT * WHERE { 
      # Select museums and a single latitude, longitude, and name for them.
      {
        SELECT ?Museum 
               (MIN(?name) as ?name)
               (MAX(?longitude) as ?longitude)
               (MAX(?latitude) as ?latitude)
        WHERE {
          ?Museum a dbpedia-owl:Museum ;
                  dbpprop:name ?name ;
                  geo:lat ?latitude ;
                  geo:long ?longitude .
          FILTER (langMatches(lang(?name),"EN"))
        }
        GROUP BY ?Museum
      }
      # Get the rest of the properties of the museum.
      ?Museum dbpprop:name ?name ;
              dbpedia-owl:abstract ?abstract ; 
              dbpedia-owl:thumbnail ?thumbnail ; 
              dbpprop:hasPhotoCollection ?photoCollection ;
              dbpprop:website ?website ; 
              foaf:homepage ?homepage ; 
              foaf:isPrimaryTopicOf ?wikilink .
      FILTER(langMatches(lang(?abstract),"EN")) 
    }
    LIMIT 20
    

    SPARQL results

    0 讨论(0)
提交回复
热议问题