问题
I'm wondering if we can know whether two resources have the same category or some subcategory (i.e., belong to categories of some common supercategory) in DBpedia? I tried this query in the DBpedia endpoint but it's wrong:
select distinct ?s ?s2 where {
?s skos:subject <http :// dbpedia.org/resource/ Category ?c.
?s2 skos:subject <http :// dbpedia.org/resource/ Category ?c2.
?c=?c2.
}
回答1:
DBpedia doesn't use skos:subject
for resources, but rather relates resources to their Wikipedia categories using dcterms:subject
. You can find out what data is available by browsing the resource pages. E.g., you might have a look at http://dbpedia.org/resource/Mount_Monadnock. If you want to find categories that two resources have in common, just use the same variable. E.g.,
?subject1 dcterms:subject ?category .
?subject2 dcterms:subject ?category .
You can write that more concisely with the ^property
notation and object lists. Writing o ^p s
is the same as writing s p o
. Object lists let you write s p o1, o2
instead of s p o1. s p o2.
. Putting these together, we can write:
?category ^dcterms:subject ?subject1, ?subject2 .
E.g., here's a query that finds common categories of Mount Monadnock and Spofford Lake. There's just one result, Landforms of Cheshire County, New Hampshire, since they only have one category in common.
select * where {
?category ^dcterms:subject dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}
SPARQL results
Now, categories are related to their supercategories in DBpedia by skos:broader
, as you can see in http://dbpedia.org/page/Category:Landforms_of_Cheshire_County,_New_Hampshire, where there are links to
- http://dbpedia.org/resource/Category:Landforms_of_New_Hampshire_by_county and
- http://dbpedia.org/resource/Category:Geography_of_Cheshire_County,_New_Hampshire
Now, this means that if two things have have some common category (or supercategory), each will be related to that category by a path starting with a dcterms:subject
link and followed by zero or more skos:broader
links. Thus, you could use a query like
select * where {
?category ^(dcterms:subject/skos:broader*) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}
You'll find, unfortunately, that the DBpedia endpoint runs into memory usage problems with that query, so you can't run it exactly like that. However, the DBpedia SPARQL endpoint supports a property path feature that actually didn't make it into the standard; you can write p{n,m}
to denote a chain of length at least n
and at most m
. This means you can put some ranges on that will get you most of the same results as *
:
select distinct ?category where {
?category ^(dcterms:subject/(skos:broader{0,3})) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}
SPARQL results
This works with Tom Cruise and Madonna as well, though you'll need to scale back the path length a bit because of the memory issues. For instance, the following query returns seventy-four results.
select distinct ?category where {
?category
^(dcterms:subject/(skos:broader{0,2}))
<http://dbpedia.org/resource/Tom_Cruise>,
<http://dbpedia.org/resource/Madonna_(entertainer)> .
}
SPARQL results
It's worth noting, though, that Wikipedia categories aren't types. So while both of those resources are rightly considered to be landforms, neither is a geography or, as you'll see in the later query, New Hampshire. Wikipedia categories are much more about topic than a type hierarchy.
Related reading
There's a related (but not quite duplicate question) that you might find helpful as well: Using SPARQL to locate a subject with multiple occurrences of same property.
来源:https://stackoverflow.com/questions/20762696/finding-common-categories-or-supercategories-of-resources