Retrieve first paragraph of Wikipedia article

依然范特西╮ 提交于 2019-12-24 00:16:53

问题


I've been trying to understand the MediaWiki documentation for the past 2 days and I can't figure out how to retrieve the first paragraph of a Wikipedia article through the MediaWiki API.

Could someone point me to the right direction?

I am about to appeal to file_get_contents, but I'm confident there's a "cleaner" solution.


回答1:


Don't try to use the raw API, instead use a client wrapper. Here's a long list to choose from, all for PHP:

http://en.wikipedia.org/wiki/Wikipedia:PHP_bot_framework_table




回答2:


file_get_contents is pretty clean, you get the HTML code. You can then parse the html code using DOMDocument. DOMDocument works as javascript, you can fetch all <p>'s in a div for example. Or grab the first one.

for example:

$html = file_get_contents('the url');

$dom = new DomDocument();
@$dom->loadHTML($html);

$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;


来源:https://stackoverflow.com/questions/9389699/retrieve-first-paragraph-of-wikipedia-article

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!