问题
In Wikidata (Wikidata SPARQL endpoint), is there a way to order the SPARQL query results with something like a PageRank?
SELECT DISTINCT ?entity ?entityLabel WHERE {
?entity wdt:P31 wd:Q5.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
} LIMIT 100 OFFSET 0
Can we specify a field to order the results by and that field expresses that the entity at the top is more notable/important/recognizable that the following one and so on?
回答1:
It seems that PageRank does not make much sense in relation to Wikidata. Obviously, large classes and large aggregates will be leaders.
Also, unlike web links, RDF predicates are "navigable" from both sides; this is just a matter of design, which URI is a subject and which URI is an object.
However, Andreas Thalhammer continues his work. Top 10 Wikidata entities are:
Q729 animal 24996.77
Q30 USA 24772.45
Q1360 Arthropoda 16930.883
Q1390 insects 16531.822
Q35409 family 14403.091
Q756 plant 14019.927
Q142 France 13723.484
Q34740 genus 13718.484
Q16 Canada 12321.178
Q159 Russia 11707.16
Unfortunately, Wikidata pageranks are not published on the (same) endpoint, one can't query them using SPARQL.
Fortunately, one can figure out some kind of a rank oneself. Possible options are:
- Number of outcoming statements (precalculated);
- Number of sitelinks (precalculated);
- Number of incoming statements (in the example below, only truthy statements are counted).
Example query:
SELECT ?item ?itemLabel ?outcoming ?sitelinks ?incoming {
?item wdt:P463 wd:Q458 .
?item wikibase:statements ?outcoming .
?item wikibase:sitelinks ?sitelinks .
{
SELECT (count(?s) AS ?incoming) ?item WHERE {
?item wdt:P463 wd:Q458 .
?s ?p ?item .
[] wikibase:directClaim ?p
} GROUP BY ?item
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }.
} ORDER BY DESC (?incoming)
Try it!
As of October 2017, all these metrics are more or less correlated.
Here below are correlation coefficients of these measures for the EU members.
Pearson
-------
outcoming sitelinks incoming pagerank
outcoming 1.0000 0.6907 0.7416 0.8652
sitelinks 0.6907 1.0000 0.4314 0.5717
incoming 0.7416 0.4314 1.0000 0.8978
pagerank 0.8652 0.5717 0.8978 1.0000
Spearman
--------
outcoming sitelinks incoming pagerank
outcoming 1.0000 0.6869 0.7619 0.8736
sitelinks 0.6869 1.0000 0.7680 0.8342
incoming 0.7619 0.7680 1.0000 0.8872
pagerank 0.8736 0.8342 0.8872 1.0000
Kendall
-------
outcoming sitelinks incoming pagerank
outcoming 1.0000 0.4914 0.5661 0.7143
sitelinks 0.4914 1.0000 0.5764 0.6454
incoming 0.5661 0.5764 1.0000 0.7249
pagerank 0.7143 0.6454 0.7249 1.0000
See also:
- https://phabricator.wikimedia.org/T143424
- https://wiki.blazegraph.com/wiki/index.php/RDF_GAS_API#PageRank
- https://phabricator.wikimedia.org/T162279
来源:https://stackoverflow.com/questions/39438022/wikidata-results-sorted-by-something-similar-to-a-pagerank