Accessing main picture of wikipedia page by API

前端未结

关注

 14  1868

Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?

相关标签:

14条回答

挽巷

2020-11-30 18:24

See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.

You can also consider just parsing the web page to find the image URL, and retrieve the image that way.

0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2020-11-30 18:29

Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:

You can use a DOM parsing lib to fetch image using the XPath.

static NSString   *kWikipediaImageXPath2    =   @"//*[@id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath3    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
static NSString   *kWikipediaImageXPath1    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath4    =   @"//*[@id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath5    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
static NSString   *kWikipediaImageXPath6    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
static NSString   *kWikipediaImageXPath7    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";

I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps

0 讨论(0)

离开以前

2020-11-30 18:33
Way 1: You can try some query like this:

http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0

in the response, you can see the Image tag.
```
<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>
```
Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy

then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net

I have no time write it to you. just give you some advice, thanks.
0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-11-30 18:34

I think not, but you can capture the image using a link parser HTML documents

0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2020-11-30 18:35
You can get the thumbnail of any wikipedia page using prop=pageimages. For example:
```
http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100
```
And you will get the thumbnail full URL.
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-11-30 18:39

http://en.wikipedia.org/w/api.php

Look at prop=images.

It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.: action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

or to calculate the URL via the filename's hash.

Unfortunately, while the array of images returned by prop=images is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").

Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页