问题
I'm trying to extract internal links from wikipedia pages. This is the query I'm using
/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=pageTitle
However, the result does not reflect what's on the wiki page. Take for example a random article here. There are only a dozen of links on this page. However, when I make the query,
/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=Von_Mises%E2%80%93Fisher_distribution
I got back 187 links. I guess the API might has a database of all the links that have ever added to the page including all the revisions. Is that the case? How can I get the links from only the last revision?
回答1:
The database has the correct list of the links in the current version of the articles. All the links you get from the API are in fact in the article. However, most of them are hidden in the (twice collapsed) navigation box at the bottom (scroll to the bottom, click "show" on the blue bar, then click "show" on the additional blue bars you now see).
Note that these links are on the page, but not defined in the wikitext - they come from the {{ProbDistributions}} navigation template (and the template that template in turn includes).
Sadly, there is no good way to list only the links that are directly/explicitly defined on a page, since template substitution happens before the actual parsing of the wiki syntax.
来源:https://stackoverflow.com/questions/22359695/how-to-get-internal-link-from-latest-revision-of-a-wikipedia-page