mediawiki-api

How to access Wikipedia from R?

♀尐吖头ヾ 提交于 2019-11-29 03:16:29
问题 Is there any package for R that allows querying Wikipedia (most probably using Mediawiki API) to get list of available articles relevant to such query, as well as import selected articles for text mining? 回答1: Use the RCurl package for retreiving info, and the XML or RJSONIO packages for parsing the response. If you are behind a proxy, set your options. opts <- list( proxy = "136.233.91.120", proxyusername = "mydomain\\myusername", proxypassword = 'whatever', proxyport = 8080 ) Use the

How to get plain text out of wikipedia

て烟熏妆下的殇ゞ 提交于 2019-11-28 18:02:25
问题 I've been searching for about 2 months now to find a script that gets the Wikipedia description section only. (It's for a bot i'm building, not for IRC.) That is, when I say /wiki bla bla bla it will go to the Wikipedia page for bla bla bla, get the following, and return it to the chatroom: "Bla Bla Bla" is the name of a song made by Gigi D'Agostino. He described this song as "a piece I wrote thinking of all the people who talk and talk without saying anything". The prominent but nonsensical

How to parse Wikipedia XML with PHP?

眉间皱痕 提交于 2019-11-28 10:34:18
How to parse Wikipedia XML with PHP? I tried it with simplepie, but I got nothing. Here is a link which I want to get its data. http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content&format=xml Edit code: <?php define("EMAIL_ADDRESS", "youlichika@hotmail.com"); $ch = curl_init(); $cv = curl_version(); $user_agent = "curl ${cv['version']} (${cv['host']}) libcurl/${cv['version']} ${cv['ssl_version']} zlib/${cv['libz_version']} <" . EMAIL_ADDRESS . ">"; curl_setopt($ch, CURLOPT_USERAGENT, $user_agent); curl

Is there any API in Java to access wikipedia data

风流意气都作罢 提交于 2019-11-28 06:51:16
I want to know: is there any API or a query interface through which I can access Wikipedia data? Mediawiki , the wiki platform that wikipedia uses does have an HTTP based API. See MediaWiki API . For example, to get pages with the title stackoverflow, you call http://en.wikipedia.org/w/api.php?action=query&titles=Stackoverflow There are some (incomplete) Java wrappers around the API - see the Client Code - Java section of the API page for more detail. For the use with Java, try http://code.google.com/p/wiki-java . It is only one class, but a great one! I had the same question and the closest I

How to get image URL property from Wikidata item by API?

為{幸葍}努か 提交于 2019-11-27 22:35:42
I've made an android app that uses the JSON Google image search API to provide images but I have noticed that Google have stopped supporting it. I have also discovered that Wikidata sometimes provides a image property on some items, however I can't seem to get the URL location of the image using the Wikidata API. Is there any way to get the image URL property from items in Wikidata? If some Wikidata item (with ID: Qxxx ) has image (P18) property, you can access it by MediaWiki API : https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Qxxx&property=P18 The response will include:

Parsing a Wikipedia dump

℡╲_俬逩灬. 提交于 2019-11-27 14:02:42
For example using this Wikipedia dump: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm Is there an existing library for Python that I can use to create an array with the mapping of subjects and values? For example: {height_ft,6},{nationality, American} It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib . You can use python's built-in XML packages to extract the page content from the API's response, then pass that content into mwlib's

Get Text Content from mediawiki page via API

我的梦境 提交于 2019-11-27 11:01:14
I'm quite new to MediaWiki, and now I have a bit of a problem. I have the title of some Wiki page, and I want to get just the text of a said page using api.php , but all that I have found in the API is a way to obtain the Wiki content of the page (with wiki markup). I used this HTTP request... /api.php?action=query&prop=revisions&rvlimit=1&rvprop=content&format=xml&titles=test But I need only the textual content, without the Wiki markup. Is that possible with the MediaWiki API? I don't think it is possible using the API to get just the text. What has worked for me was to request the HTML page

$getJSON and for loop issue

痞子三分冷 提交于 2019-11-27 08:07:34
This is to populate a table with the amount of results that are returned from the MediaWiki API query /api.php?action=query&list=querypage&qppage=BrokenRedirects . The number of results is then added to the id, for example: // BrokenRedirects $.getJSON('/api.php?action=query&list=querypage&qppage=BrokenRedirects&format=json', function (data) { $('#BrokenRedirects').text(data.query.querypage.results.length); }); But as it's being repeated another 7 times I made the arguments for qppage into an array and used a for loop to shorten overall code. var array = ['BrokenRedirects', 'DoubleRedirects',

How to get Infobox from a Wikipedia article by Mediawiki API?

六月ゝ 毕业季﹏ 提交于 2019-11-27 07:26:45
Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes Infobox. http://en.wikipedia.org/w/api.php?action=parse&pageid=568801&section=0&prop=wikitext What I want is a query which will return only Infobox data. Is this possible? You can do it with a url call to the Wikipedia API like this: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0 Replace the titles= section with your page title, and format=xmlfm to format=json if you

How to retrieve Wiktionary word content?

£可爱£侵袭症+ 提交于 2019-11-27 05:53:49
How may Wiktionary's API be used to determine whether or not a word exists? Michael Mrozek The Wiktionary API can be used to query whether or not a word exists. Examples for existing and non-existing pages: http://en.wiktionary.org/w/api.php?action=query&titles=test http://en.wiktionary.org/w/api.php?action=query&titles=testx The first link provides examples on other types of formats that might be easier to parse. To retrieve the word's data in a small XHTML format (should more than existence be required), request the printable version of the page: http://en.wiktionary.org/w/index.php?title