wikipedia-api

How to get the result of “all pages with prefix” using Wikipedia api?

我们两清 提交于 2019-12-12 18:03:25
问题 I wish to use Wikipedia api to extract the result of this page: http://en.wikipedia.org/wiki/Special:PrefixIndex When searching "something" on it, for example this: http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4 Then, I would like to access each of the resulting pages and extract their information. What api call might I use? 回答1: You can use list=allpages and specify apprefix . For example: http://en.wikipedia.org/w/api.php?format=xml&action=query&list

How to get coordinates from a Wikipedia page through API?

馋奶兔 提交于 2019-12-12 09:43:28
问题 I want to get the coordinates of a Wikipedia page through their API. I want to put the page title as 'titles' parameter. I have searched SO for a solution but seems they are scrapping the page then extracting. Is it possible through their API? 回答1: You need to use Wikipedia API. For your example with Kinkaku-ji the query will be: https://en.wikipedia.org/w/api.php?action=query&prop=coordinates&titles=Kinkaku-ji For more than one title use pipe to separate them: titles=Kinkaku-ji|Paris|... 来源:

Wikimedia API get generator metadata

放肆的年华 提交于 2019-12-12 04:09:50
问题 I want to get pages from Wikimedia Commons and it seems, that I have still not understand the usage of the Wikimedia API. I use the following query https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=url|size|mime|mediatype|extmetadata&iiurlwidth=150&generator=search&gsrsearch=transformation&gsrnamespace=6&gsrlimit=9&gsroffset=0&gsrinfo=totalhits See in API Sandbox Which works great, except that I don't get the grsinfo / generator metadata. But I need the

Fetch the description from wikipedia from an article

独自空忆成欢 提交于 2019-12-12 02:17:59
问题 I am trying to make a API call to wikipedia through: http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=xml, but the xml is full with html and css tags. Is there a way to fetch only plain text without tags? Thanks! *Edit 1: $json = json_decode(file_get_contents('http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json')); $txt = strip_tags($json->text); var_dump($json); Null displayed. 回答1: Question was partially answered here $url = 'http://en.wikipedia.org/w

Wikipedia API to get articles belonging to a category

萝らか妹 提交于 2019-12-11 20:21:26
问题 I would like to get a number of pages belonging to a specific category, say sports and politics. I would like to extract various sections from the pages, such as abstract, title, etc. Is there an API to do that? If not, are there any Wikipedia dumps organized by categories? Thanks 回答1: You're looking for the categorymembers api. Notice that you will only get pages directly in that single category, no subcategories; and there are no intersection operators. You probably will want to use that

Simplexml_load_file empty array

不打扰是莪最后的温柔 提交于 2019-12-11 17:51:25
问题 I try to load this document: $url = "http://en.wikipedia.org/w/api.php?action=query&titles=Electrophoresis&prop=langlinks&lllimit=500"; When I run it in browser, everything is fine. When I do this: ini_set('user_agent', 'XX123456789 (localhost; myemailaddress)'); //sets info for authentication $content = file_get_content($url); var_dump($content); It return the same xml document as my browser show. However when I try to $content_arrays = Simplexml_load_file($content); echo '<pre>', print_r(

fetch() with the Wikipedia API results in “TypeError: NetworkError when attempting to fetch resource.”

徘徊边缘 提交于 2019-12-11 14:38:02
问题 fetch('https://en.wikipedia.org/w/api.php?action=query&titles=Main%20Page&prop=revisions&rvprop=content&format=json') .then( function(response) { if (response.status !== 200) { console.log('Looks like there was a problem. Status Code: ' + response.status); return; } // Examine the text in the response response.json().then(function(data) { console.log(data); }); } ) .catch(function(err) { document.write('Fetch Error :-S', err); }); The fetch address I'm using is listed here: https://www

How to parse the attributes value inside {{}} (curly braces) inside a infobox

﹥>﹥吖頭↗ 提交于 2019-12-11 11:05:08
问题 Within Infobox at wikipedia some attributes values are also inside curly braces {{}}.. Some time they have lins also.. I need values inside the braces, which is displayed on wikipedia web page. I read these are templates also.. Can anyone give me some link or guide me how do I deal with it? 回答1: Double-curly-braces {{}} define a call to some kind of magic word, variable, parser function, or template.. Help can be found on MediaWiki.org/.../Manual:Magic_words. The little lines that look like |

How to get Titles from a Wikipedia Page

淺唱寂寞╮ 提交于 2019-12-11 10:18:49
问题 Is there a direct API call where I can get titles from a wikipedia page. For e.g. from http://en.wikipedia.org/wiki/Chicago, I want to retrieve the following: 1 History 1.1 Rapid growth and development 1.2 20th and 21st centuries 2 Geography 2.1 Topography 2.2 Climate 3 Cityscape 3.1 Architecture so on ----------- I have looked at http://www.mediawiki.org/wiki/API:Lists/All, but couldn't find an action which gives me above list from a wiki page. 回答1: What you want is not a list of pages, so

How would you handle different formats of dates?

喜夏-厌秋 提交于 2019-12-11 09:13:27
问题 I have different types of dates formatting like: 27 - 28 August 663 CE 22 August 1945 19 May May 4 1945 – August 22 1945 5/4/1945 2-7-1232 03-4-1020 1/3/1 (year 1) 09/08/0 (year 0) Note they are all different formats, different order, some have 2 months, some only one, I tried to use moment js with no results, I also tried to use date js yet, no luck. I tried to do some splitting: dates.push({ Time : [] }); function doSelect(text) { return $wikiDOM.find(".infobox th").filter(function() {