Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?
This is good way to get the Main Image of a page in wikipedia
http://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=India
Lets take Example of Page http://en.wikipedia.org/wiki/index.html?curid=57570 to get Main Pic
Check out
prop=pageprops
action=query&pageids=57570&prop=pageprops&format=json
Results Page Data Eg.
{ "pages" : { "57570":{
"pageid":57570,
"ns":0,
"title":"Sachin Tendulkar",
"pageprops" : {
"defaultsort":"Tendulkar,Sachin",
"page_image":"Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg",
"wikibase_item":"Q9488"
}
}
}
}}
We get main Pic file name this result as
** (wikiId).pageprops.page_image = Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg**
Now as we have Image file name we will have to make another Api Call to get full image path from file name as follows
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
Eg.
action=query&titles=Image:Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg&prop=imageinfo&iiprop=url
Returns Array of Image Data having url in it as http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg
I have written some code that gets main image (full URL) by Wikipedia article title. It's not perfect, but overall I'm very pleased with the results.
The challenge was that when queried for a specific title, Wikipedia returns multiple image filenames (without path). Furthermore, the secondary search (I used the code varatis posted in this thread - thanks!) returns URLs of all images found based on the image filename that was searched, regardless of the original article title. After all this, we may end up with a generic image irrelevant to the search, so we filter those out. The code iterates over filenames and URLs until it finds (hopefully the best) match... a bit complicated, but it works :)
Note on the generic filter: I've been compiling a list of generic image strings for the isGeneric() function, but the list just keeps growing. I am considering maintaining it as a public list - if there is any interest let me know.
Pre:
protected static $baseurl = "http://en.wikipedia.org/w/api.php";
Main function - get image URL from title:
public static function getImageURL($title)
{
$images = self::getImageFilenameObj($title); // returns JSON object
if (!$images) return '';
foreach ($images as $image)
{
// get object of image URL for given filename
$imgjson = self::getFileURLObj($image->title);
// return first image match
foreach ($imgjson as $img)
{
// get URL for image
$url = $img->imageinfo[0]->url;
// no image found
if (!$url) continue;
// filter generic images
if (self::isGeneric($url)) continue;
// match found
return $url;
}
}
// match not found
return '';
}
== The following functions are called by the main function above ==
Get JSON object (filenames) by title:
public static function getImageFilenameObj($title)
{
try // see if page has images
{
// get image file name
$json = json_decode(
self::retrieveInfo(
self::$baseurl . '?action=query&titles=' .
urlencode($title) . '&prop=images&format=json'
))->query->pages;
/** The foreach is only to get around
* the fact that we don't have the id.
*/
foreach ($json as $id) { return $id->images; }
}
catch(exception $e) // no images
{
return NULL;
}
}
Get JSON object (URLs) by filename:
public static function getFileURLObj($filename)
{
try // resolve URL from filename
{
return json_decode(
self::retrieveInfo(
self::$baseurl . '?action=query&titles=' .
urlencode($filename) . '&prop=imageinfo&iiprop=url&format=json'
))->query->pages;
}
catch(exception $e) // no URLs
{
return NULL;
}
}
Filter out generic images:
public static function isGeneric($url)
{
$generic_strings = array(
'_gray.svg',
'icon',
'Commons-logo.svg',
'Ambox',
'Text_document_with_red_question_mark.svg',
'Question_book-new.svg',
'Canadese_kano',
'Wiki_letter_',
'Edit-clear.svg',
'WPanthroponymy',
'Compass_rose_pale',
'Us-actor.svg',
'voting_box',
'Crystal_',
'transportation_inv',
'arrow.svg',
'Quill_and_ink-US.svg',
'Decrease2.svg',
'Rating-',
'template',
'Nuvola_apps_',
'Mergefrom.svg',
'Portal-',
'Translation_to_',
'/School.svg',
'arrow',
'Symbol_',
'stub',
'Unbalanced_scales.svg',
'-logo.',
'P_vip.svg',
'Books-aj.svg_aj_ashton_01.svg',
'Film',
'/Gnome-',
'cap.svg',
'Missing',
'silhouette',
'Star_empty.svg',
'Music_film_clapperboard.svg',
'IPA_Unicode',
'symbol',
'_highlighting_',
'pictogram',
'Red_pog.svg',
'_medal_with_cup',
'_balloon',
'Feature',
'Aiga_'
);
foreach ($generic_strings as $str)
{
if (stripos($url, $str) !== false) return true;
}
return false;
}
Comments welcome.
I there is a way to reliably get a main image for a wikipedia page - the Extension called PageImages
The PageImages extension collects information about images used on a page.
Its aim is to return the single most appropriate thumbnail associated with an article, attempting to return only meaningful images, e.g. not those from maintenance templates, stubs or flag icons. Currently it uses the first non-meaningless image used in the page.
https://www.mediawiki.org/wiki/Extension:PageImages
Just add the prop pageimages to your API Query:
/w/api.php?action=query&prop=pageimages&titles=Somepage&format=xml
This reliably filters out annoying default images and prevents you from having to filter them yourself! The extension is installed on all the main wikipedia pages...
I'm sorry for not answering specifically your question about the main image. But here's some code to get a list of all images:
function makeCall($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($curl);
}
function wikipediaImageUrls($url) {
$imageUrls = array();
$pathComponents = explode('/', parse_url($url, PHP_URL_PATH));
$pageTitle = array_pop($pathComponents);
$imagesQuery = "http://en.wikipedia.org/w/api.php?action=query&titles={$pageTitle}&prop=images&format=json";
$jsonResponse = makeCall($imagesQuery);
$response = json_decode($jsonResponse, true);
$imagesKey = key($response['query']['pages']);
foreach($response['query']['pages'][$imagesKey]['images'] as $imageArray) {
if($imageArray['title'] != 'File:Commons-logo.svg' && $imageArray['title'] != 'File:P vip.svg') {
$title = str_replace('File:', '', $imageArray['title']);
$title = str_replace(' ', '_', $title);
$imageUrlQuery = "http://en.wikipedia.org/w/api.php?action=query&titles=Image:{$title}&prop=imageinfo&iiprop=url&format=json";
$jsonUrlQuery = makeCall($imageUrlQuery);
$urlResponse = json_decode($jsonUrlQuery, true);
$imageKey = key($urlResponse['query']['pages']);
$imageUrls[] = $urlResponse['query']['pages'][$imageKey]['imageinfo'][0]['url'];
}
}
return $imageUrls;
}
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Saturn_%28mythology%29'));
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel'));
I got this for http://en.wikipedia.org/wiki/Saturn_%28mythology%29:
Array
(
[0] => http://upload.wikimedia.org/wikipedia/commons/1/10/Arch_of_SeptimiusSeverus.jpg
[1] => http://upload.wikimedia.org/wikipedia/commons/8/81/Ivan_Akimov_Saturn_.jpg
[2] => http://upload.wikimedia.org/wikipedia/commons/d/d7/Lucius_Appuleius_Saturninus.jpg
[3] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Polidoro_da_Caravaggio_-_Saturnus-thumb.jpg
[4] => http://upload.wikimedia.org/wikipedia/commons/b/bd/Porta_Maggiore_Alatri.jpg
[5] => http://upload.wikimedia.org/wikipedia/commons/6/6a/She-wolf_suckles_Romulus_and_Remus.jpg
[6] => http://upload.wikimedia.org/wikipedia/commons/4/45/Throne_of_Saturn_Louvre_Ma1662.jpg
)
And for the second URL (http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):
Array
(
[0] => http://upload.wikimedia.org/wikipedia/commons/e/e9/BmRKEL.jpg
[1] => http://upload.wikimedia.org/wikipedia/commons/3/3f/BmRKELS.jpg
[2] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Bundesarchiv_Bild_101I-655-5976-04%2C_Russland%2C_Sturzkampfbomber_Junkers_Ju_87_G.jpg
[3] => http://upload.wikimedia.org/wikipedia/commons/6/62/Bundeswehr_Kreuz_Black.svg
[4] => http://upload.wikimedia.org/wikipedia/commons/9/99/Flag_of_German_Reich_%281935%E2%80%931945%29.svg
[5] => http://upload.wikimedia.org/wikipedia/en/6/64/HansUlrichRudel.jpeg
[6] => http://upload.wikimedia.org/wikipedia/commons/8/82/Heinkel_He_111_during_the_Battle_of_Britain.jpg
[7] => http://upload.wikimedia.org/wikipedia/commons/6/66/Regulation_WW_II_Underwing_Balkenkreuz.png
)
Note that the URL changed a bit on the 6th element of the second array. It's what @JosephJaber was warning about in his comment above.
Hope this helps someone.
You can also use cocoa Pod called SDWebImage
Code sample (remember to also add import SDWebImage
):
func requestInfo(flowerName: String) {
let parameters : [String:String] = [
"format" : "json",
"action" : "query",
"prop" : "extracts|pageimages",//pageimages allows fetch imagePath
"exintro" : "",
"explaintext" : "",
"titles" : flowerName,
"indexpageids" : "",
"redirects" : "1",
"pithumbsize" : "500"//specify image size in px
]
AF.request(wikipediaURL, method: .get, parameters: parameters).responseJSON { (response) in
switch response.result {
case .success(let value):
print("Got the wikipedia info.")
print(response)
let flowerJSON : JSON = JSON(response.value!)
let pageid = flowerJSON["query"]["pageids"][0].stringValue
let flowerDescription = flowerJSON["query"]["pages"][pageid]["extract"].stringValue
let flowerImageURL = flowerJSON["query"]["pages"][pageid]["thumbnail"]["source"].stringValue //fetching Image URL
self.wikiInfoLabel.text = flowerDescription
self.imageView.sd_setImage(with: URL(string : flowerImageURL))//imageView updated with Wiki Image
case .failure(let error):
print(error)
}
}
}