Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?
See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.
You can also consider just parsing the web page to find the image URL, and retrieve the image that way.
Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:
You can use a DOM parsing lib to fetch image using the XPath.
static NSString *kWikipediaImageXPath2 = @"//*[@id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
static NSString *kWikipediaImageXPath3 = @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
static NSString *kWikipediaImageXPath1 = @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
static NSString *kWikipediaImageXPath4 = @"//*[@id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
static NSString *kWikipediaImageXPath5 = @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
static NSString *kWikipediaImageXPath6 = @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
static NSString *kWikipediaImageXPath7 = @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";
I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps
Way 1: You can try some query like this:
http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0
in the response, you can see the Image
tag.
<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>
Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy
then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net
I have no time write it to you. just give you some advice, thanks.
I think not, but you can capture the image using a link parser HTML documents
You can get the thumbnail of any wikipedia page using prop=pageimages
. For example:
http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100
And you will get the thumbnail full URL.
http://en.wikipedia.org/w/api.php
Look at prop=images
.
It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
or to calculate the URL via the filename's hash.
Unfortunately, while the array of images returned by prop=images
is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").
Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.