Accessing main picture of wikipedia page by API

前端 未结 14 1868
没有蜡笔的小新
没有蜡笔的小新 2020-11-30 18:15

Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?

相关标签:
14条回答
  • 2020-11-30 18:24

    See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.

    You can also consider just parsing the web page to find the image URL, and retrieve the image that way.

    0 讨论(0)
  • 2020-11-30 18:29

    Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:

    You can use a DOM parsing lib to fetch image using the XPath.

    static NSString   *kWikipediaImageXPath2    =   @"//*[@id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
    static NSString   *kWikipediaImageXPath3    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
    static NSString   *kWikipediaImageXPath1    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
    static NSString   *kWikipediaImageXPath4    =   @"//*[@id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
    static NSString   *kWikipediaImageXPath5    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
    static NSString   *kWikipediaImageXPath6    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
    static NSString   *kWikipediaImageXPath7    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";
    

    I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps

    0 讨论(0)
  • 2020-11-30 18:33

    Way 1: You can try some query like this:

    http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0

    in the response, you can see the Image tag.

    <Item>
    <Text xml:space="preserve">Italy national rugby union team</Text>
    <Description xml:space="preserve">
    The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
    </Description>
    <Url xml:space="preserve">
    http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
    </Url>
    <Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
    </Item>
    

    Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy

    then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net

    I have no time write it to you. just give you some advice, thanks.

    0 讨论(0)
  • 2020-11-30 18:34

    I think not, but you can capture the image using a link parser HTML documents

    0 讨论(0)
  • 2020-11-30 18:35

    You can get the thumbnail of any wikipedia page using prop=pageimages. For example:

    http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100
    

    And you will get the thumbnail full URL.

    0 讨论(0)
  • 2020-11-30 18:39

    http://en.wikipedia.org/w/api.php

    Look at prop=images.

    It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.: action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

    or to calculate the URL via the filename's hash.

    Unfortunately, while the array of images returned by prop=images is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").

    Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.

    0 讨论(0)
提交回复
热议问题