问题
I have developed the following SPARQL query to get a list of countries with its population from DBpedia. I use the union clauses to identify which resources are current countries because the information is inconsistent between the different countries, for example there are different standards for country codes and some of them don't even have one.
Now the problem that I have is that some of the countries have a dbpprop:populationEstimate
property but others have dbpprop:populationCensus
and I don't know how to get both of them to bind ?population
. As it is now I only get the estimate population, I guess it is because having two OPTIONAL
clauses to match ?population
doesn't make sense, but I can't get any closer to the solution.
For example India have dbpprop:populationCensus
, but it doesn't appear in the result.
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX yago:<http://dbpedia.org/class/yago/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX category: <http://dbpedia.org/resource/Category:>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?name ?population
WHERE {
?country a dbo:Country .
?country rdfs:label ?enName .
OPTIONAL {?country dbpprop:populationEstimate ?population}
OPTIONAL {?country dbpprop:populationCensus ?population}
OPTIONAL {?country dbpprop:yearEnd ?yearEnd}
{ ?country dbpprop:iso3166code ?code . }
UNION
{ ?country dbpprop:iso31661Alpha ?code . }
UNION
{ ?country dbpprop:countryCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
FILTER (langMatches(lang(?enName), "en"))
FILTER (!bound(?yearEnd))
FILTER (xsd:integer(?population))
BIND (str(?enName) AS ?name)
}
Thanks everyone for your help :)
回答1:
First, I'm going to use the prefixes defined in the DBpedia SPARQL endpoint so that we can copy and paste queries. I think the only difference is that dbo
will now be dbpedia-owl
. Second, you're using a number of raw data properties, but if you can, you ought to try to use properties from the ontology, as explained in this answer. That doesn't necessarily affect the results you're getting here, but you'll generally get cleaner data if you use the ontology properties.
Modifying your query
FILTER NOT EXISTS for removing countries that have ended
Let's clean up the query a little bit first, and then tend to the question of the getting the various population properties. Removing countries that have an end date can be done a bit more simply. Instead of
OPTIONAL {?country dbpprop:yearEnd ?yearEnd}
FILTER (!bound(?yearEnd))
you can use FILTER NOT EXISTS to make this a bit more direct:
FILTER NOT EXISTS { ?country dbpprop:yearEnd ?yearEnd }
In an attempt to use properties from the DBpedia ontology in preference to Raw Infobox data properties, you might want to consider using dbpedia-owl:dissolutionYear
rather than dbpprop:yearEnd
, giving:
FILTER NOT EXISTS { ?country dbpedia-owl:dissoluationYear ?yearEnd }
Simplify filtering for languages
It's reasonable to expect rdfs:label
values to be literals, and the lang
function requires its argument to be a literal, so you don't really need to bind str(?enName)
to ?name
; it's sufficient just to bind ?name
in the triple pattern, and then check its language (which you're doing correctly using langMatches
). That is, instead of
?country rdfs:label ?enName .
FILTER (langMatches(lang(?enName), "en"))
BIND (str(?enName) AS ?name)
you can just use
?country rdfs:label ?name .
FILTER (langMatches(lang(?name), "en"))
This does mean that the name you get back will have a language tag. If you really just want the plain string, you can either BIND as you did before, or make an as
expression in the select, e.g.,
SELECT DISTINCT (str(?name) as ?noLangName) ?population
Checking that population is bound and is a number
I don't think filtering on xsd:integer(?population)
will do much for you either. That notation isn't a type predicate, but a casting function, so ?population
is being cast as an integer, and I think the filter will always let the value through, except in the case of 0
, which would fail. You'd still want to know if a country has a population of 0
though, right? However, you do only want countries with populations, so you could filter with bound:
FILTER(bound(?population))
However, since the properties here are raw infobox properties, there is some noise in the data, so we end up with values like
"Denmark"@en "- Density 57,695"@en
"Denmark"@en "- Faroe Islands"@en
which aren't useful. A better filter would just check that the value is a number (which will implicitly require that it's bound), and there is a function isNumeric for just that purpose, so we use:
FILTER (isNumeric(?population))
Simplifying similar UNION patterns with VALUES
You can clean up the UNION
pattern by using VALUES. Instead of UNION
ing several almost identical patterns, you can define a variable ?hasCode
that will only have the values dbpprop:iso3166code
, etc. I.e., instead of:
{ ?country dbpprop:iso3166code ?code . }
UNION
{ ?country dbpprop:iso31661Alpha ?code . }
UNION
{ ?country dbpprop:countryCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
you can use:
values ?hasCode { dbpprop:iso3166code dbpprop:iso31661Alpha dbpprop:countryCode }
{ ?country ?hasCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
You can do a similar thing with the ?population
retrieval:
OPTIONAL {?country dbpprop:populationEstimate ?population}
OPTIONAL {?country dbpprop:populationCensus ?population}
can become:
values ?hasPopulation { dbpprop:populationEstimate dbpprop:populationCensus }
OPTIONAL { ?country ?hasPopulation ?population }
The final result
The rewritten query is now:
SELECT DISTINCT ?name ?population
WHERE {
?country a dbpedia-owl:Country .
?country rdfs:label ?name .
FILTER (langMatches(lang(?name), "en"))
values ?hasPopulation { dbpprop:populationEstimate dbpprop:populationCensus }
OPTIONAL { ?country ?hasPopulation ?population }
FILTER (isNumeric(?population))
FILTER NOT EXISTS { ?country dbpedia-owl:dissolutionYear ?yearEnd }
values ?hasCode { dbpprop:iso3166code dbpprop:iso31661Alpha dbpprop:countryCode }
{ ?country ?hasCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
}
SPARQL results
India now appears in the results with a population:
"India"@en 1210193422
回答2:
How to work around the problem
I think I have an idea of how you could work around this problem.
For the optional clauses, use separate variables
OPTIONAL {?country dbpprop:populationEstimate ?populationEstimate}
OPTIONAL {?country dbpprop:populationCensus ?populationCensus}
OPTIONAL {?country dbpprop:yearEnd ?yearEnd}
Then, bind one of them to ?population
BIND(IF(bound(?populationEstimate), ?populationEstimate, ?populationCensus) as ?population)
Finally, check the bound variable in your filter expression
FILTER (xsd:integer(?population))
The rest of the query remains the same. I've tested this against the DBpedia SPARQL endpoint and at first glance, it seems to yield the right results.
Let me know if this is correct.
The full query
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX yago:<http://dbpedia.org/class/yago/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX category: <http://dbpedia.org/resource/Category:>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?name ?population
WHERE {
?country a dbo:Country .
?country rdfs:label ?enName .
OPTIONAL {?country dbpprop:populationEstimate ?populationEstimate}
OPTIONAL {?country dbpprop:populationCensus ?populationCensus}
OPTIONAL {?country dbpprop:yearEnd ?yearEnd}
BIND(IF(bound(?populationEstimate), ?populationEstimate, ?populationCensus) as ?population)
FILTER (langMatches(lang(?enName), "en"))
FILTER (!bound(?yearEnd))
FILTER (xsd:integer(?population))
{ ?country dbpprop:iso3166code ?code . }
UNION
{ ?country dbpprop:iso31661Alpha ?code . }
UNION
{ ?country dbpprop:countryCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
BIND (str(?enName) AS ?name)
}
来源:https://stackoverflow.com/questions/19145979/sparql-query-to-retrieve-countries-population-from-dbpedia