How to get Infobox from a Wikipedia article by Mediawiki API?

六月ゝ 毕业季﹏ 提交于 2019-11-27 07:26:45

You can do it with a url call to the Wikipedia API like this:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0

Replace the titles= section with your page title, and format=xmlfm to format=json if you want the article in json format.

jpatokal

Instead of parsing infoboxes yourself, which is quite complicated, take a look at DBPedia, which has Wikipedia infoboxes extracted out as database objects.

Building on @garry's answer, you can have wikipedia parse the info box into html for you via the rvparse parameter like so:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0&rvparse

Note that neither method will return just the info box. But from the html content, you can extract (via, e.g., beautifulsoup) the table with class infobox.

In Python, you do something like the following

resp = requests.get(url).json()
page_one = next(iter(resp['query']['pages'].values()))
revisions = page_one.get('revisions', [])
html = next(iter(revisions[0].values()))
# now parse the html 

If the page has a right side infobox, then use this URL to obtain it in txt form. My example is using the element Hydrogen. All you need to do is replace "Hydrogen" with your title.

https://en.wikipedia.org/w/index.php?action=raw&title=Template:Infobox%20hydrogen

If you are looking for JSON format use this URL, but its not pretty.

https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!