问题
I've been trying to understand the MediaWiki documentation for the past 2 days and I can't figure out how to retrieve the first paragraph of a Wikipedia article through the MediaWiki API.
Could someone point me to the right direction?
I am about to appeal to file_get_contents, but I'm confident there's a "cleaner" solution.
回答1:
Don't try to use the raw API, instead use a client wrapper. Here's a long list to choose from, all for PHP:
http://en.wikipedia.org/wiki/Wikipedia:PHP_bot_framework_table
回答2:
file_get_contents is pretty clean, you get the HTML code.
You can then parse the html code using DOMDocument.
DOMDocument works as javascript, you can fetch all <p>
's in a div for example.
Or grab the first one.
for example:
$html = file_get_contents('the url');
$dom = new DomDocument();
@$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;
来源:https://stackoverflow.com/questions/9389699/retrieve-first-paragraph-of-wikipedia-article