converting freebase MQL to SPARQL

吃可爱长大的小学妹 提交于 2019-12-04 09:05:49
Joshua Taylor

Max and I had a bit of discussion in a chat, and this might end up being the same approach that Max took. I think it's a bit more readable, though. It gets 15 artists with albums, and up to 5 albums for each one. If you want to be able to include artists without any albums, you'd need to make some parts optional.

select ?artist ?album {
  #-- select 15 bands that have albums (i.e., 
  #-- such that they are the artist *of* something).
  {
    select distinct ?artist { 
      ?artist a dbpedia-owl:Band ;
              ^dbpedia-owl:artist []
    }
    limit 15
  }

  #-- grab ordered pairs (x,y) (where y > x) of their
  #-- albums.  By asking how many x's for each y, we
  #-- get just the first n y's.
  ?artist ^dbpedia-owl:artist ?album, ?album_
  filter ( ?album_ <= ?album ) 
}
group by ?artist ?album
having count(?album_) <= 5 #-- take up 5 albums for each artist
order by ?artist ?album

SPARQL results

Based on the result you want to get, this involves some kind of nested co-related sub-query processing which is not directly feasible in a single SPARQL query (at least to my understanding, but if it is possible, I'm totally in ;) ):

Due to the bottom-up nature of SPARQL query evaluation, the subqueries are evaluated logically first, and the results are projected up to the outer query.

The second limit clause being applied after the join evaluation with the subquery, it will just limit the number of results for the outer query.

Using a LIMIT k (k=5) clause on the 2nd try's subquery will effectively return you the 5 artists you require but then limiting n to 50 would only force the album results (outer query) to a global 50 results for all these 5 artists and not a 50/artist as you would want. Turning the queries inside-out would give you a similar effect.

EDIT: A possible solution would be to build a subquery for all artists/albums and limit the subquery where to where the (somehow) ordered album count is lower than 50 (here using an album title IRI sort)

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>
SELECT ?artist ?outputAlbum
WHERE 
{
    {
        SELECT ?artist (MAX(str(?album1)) as ?maxedAlbum)
        WHERE {
            ?album1 prop:artist ?artist .
            ?album2 prop:artist ?artist .
            FILTER (str(?album2) < str(?album1))
        } 
        GROUP BY ?artist 
        HAVING count(?album2)<= 50
        LIMIT 5
    } 
    ?outputAlbum prop:artist ?artist .
    FILTER (str(?outputAlbum) < str(?maxedAlbum))
}

EDIT 2: last query would be the naive approach but it seems there is some inference (unknown re"gime) on the dbpedia endpoint (as shown under). A more exact query would require to have some more filters and distinct clauses -I added distinct and global count in the output to show there is still some inference somewhere):

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>
SELECT ?artist ?outputAlbum ?maxedCount ?inferredCrossJoinCount
WHERE 
{
    {
        SELECT ?artist (MAX(str(?album1)) as ?maxedAlbum) (count(distinct ?album2) as ?maxedCount) (count(?album2) as ?inferredCrossJoinCount)
        WHERE {
            ?artist rdf:type dbpedia-owl:Artist .
            ?album1 ?p ?artist .
            ?album2 ?p ?artist .
            FILTER (sameTerm(?p, prop:artist))
            FILTER (str(?album1) < str(?album2))
        } 
        GROUP BY ?artist 
        #HAVING count(?album2)<= 50
        LIMIT 5
    } 
    ?outputAlbum ?p ?artist .
    FILTER (sameTerm(?p, prop:artist))
    FILTER (str(?outputAlbum) < str(?maxedAlbum))
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!