How to get internal link from latest revision of a wikipedia page?

吃可爱长大的小学妹 提交于 2019-12-23 02:57:07

问题


I'm trying to extract internal links from wikipedia pages. This is the query I'm using

/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=pageTitle

However, the result does not reflect what's on the wiki page. Take for example a random article here. There are only a dozen of links on this page. However, when I make the query,

/w/api.php?action=query&prop=links&format=xml&plnamespace=0&pllimit=max&titles=Von_Mises%E2%80%93Fisher_distribution

I got back 187 links. I guess the API might has a database of all the links that have ever added to the page including all the revisions. Is that the case? How can I get the links from only the last revision?


回答1:


The database has the correct list of the links in the current version of the articles. All the links you get from the API are in fact in the article. However, most of them are hidden in the (twice collapsed) navigation box at the bottom (scroll to the bottom, click "show" on the blue bar, then click "show" on the additional blue bars you now see).

Note that these links are on the page, but not defined in the wikitext - they come from the {{ProbDistributions}} navigation template (and the template that template in turn includes).

Sadly, there is no good way to list only the links that are directly/explicitly defined on a page, since template substitution happens before the actual parsing of the wiki syntax.



来源:https://stackoverflow.com/questions/22359695/how-to-get-internal-link-from-latest-revision-of-a-wikipedia-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!