wikipedia-api

MediaWiki URL parameters without values

亡梦爱人 提交于 2019-12-19 06:14:10
问题 The query part of a URL seems to consist of key-value pairs separated by & and associated by = . I've taken to always using jQuery's $.param() function to URL-encode my query strings because I find it makes my code more readable and maintainable. In the past couple of days I find myself calling the MediaWiki API but when cleaning up my working prototype with hard-coded URLs to use $.param() I noticed some MediaWiki APIs include query parameters with keys but not values! api.php ? action=query

Get first lines of Wikipedia Article

本小妞迷上赌 提交于 2019-12-18 11:53:32
问题 I got a Wikipedia-Article and I want to fetch the first z lines (or the first x chars, or the first y words, doesn't matter) from the article. The problem: I can get either the source Wiki-Text (via API) or the parsed HTML (via direct HTTP-Request, eventually on the print-version) but how can I find the first lines displayed? Normaly the source (both html and wikitext) starts with the info-boxes and images and the first real text to display is somewhere down in the code. For example: Albert

How to get information in info box of Wikipedia articles using Wikipedia api?

ⅰ亾dé卋堺 提交于 2019-12-18 09:40:24
问题 I'm trying to get lead actor's name from movie's Wikipedia article. I tried different values for prop, prop=info seems most relevant. But this doesn't contain the information in info box of Wikipedia article. See: http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Casino_Royale_(2006_film)&format=jsonfm Is it possible to extract information in infobox using Wikipedia API? 回答1: The MediaWiki API doesn't understand infoboxes. So, you have basically two options: Parse the infobox

How to parse Wikipedia XML with PHP?

走远了吗. 提交于 2019-12-17 19:29:46
问题 How to parse Wikipedia XML with PHP? I tried it with simplepie, but I got nothing. Here is a link which I want to get its data. http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content&format=xml Edit code: <?php define("EMAIL_ADDRESS", "youlichika@hotmail.com"); $ch = curl_init(); $cv = curl_version(); $user_agent = "curl ${cv['version']} (${cv['host']}) libcurl/${cv['version']} ${cv['ssl_version']} zlib

Is there any API in Java to access wikipedia data

倖福魔咒の 提交于 2019-12-17 18:17:26
问题 I want to know: is there any API or a query interface through which I can access Wikipedia data? 回答1: Mediawiki, the wiki platform that wikipedia uses does have an HTTP based API. See MediaWiki API. For example, to get pages with the title stackoverflow, you call http://en.wikipedia.org/w/api.php?action=query&titles=Stackoverflow There are some (incomplete) Java wrappers around the API - see the Client Code - Java section of the API page for more detail. 回答2: For the use with Java, try http:/

Parsing a Wikipedia dump

一世执手 提交于 2019-12-17 10:59:18
问题 For example using this Wikipedia dump: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm Is there an existing library for Python that I can use to create an array with the mapping of subjects and values? For example: {height_ft,6},{nationality, American} 回答1: It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib. You can use python's built

Wikipedia API + Cross-origin requests

妖精的绣舞 提交于 2019-12-17 09:30:43
问题 I'm trying to access wikipedia using javascript+CORS As far as I know, wikipedia should support CORS: http://www.mediawiki.org/wiki/API:Cross-site_requests I tried the following script: create a XMLHttpRequest+credential/XDomainRequest, add some Http-Headers ( "Access-Control-Allow-Credentials",...) and send the query. http://jsfiddle.net/lindenb/Vr7RS/ var WikipediaCORS= { setMessage:function(msg) { var span=document.getElementById("id1"); span.appendChild(document.createTextNode(msg)); }, /

Scrape Data from Wikipedia

自古美人都是妖i 提交于 2019-12-14 03:48:54
问题 I am trying to find or build a web scraper that is able to go through and find every state/national park in the US along with their GPS coordinates and land area. I have looked into some frameworks like Scrapy and then I see there are some sites that are specifically for Wikipedia such as http://wiki.dbpedia.org/About. Is there any specific advantage to either one of these or would either one work better to load the information into an online database? 回答1: Let's suppose you want to parse

Finding and downloading images within the Wikipedia Dump

主宰稳场 提交于 2019-12-13 11:36:56
问题 I'm trying to find a comprehensive list of all images on wikipedia, which I can then filter down to the public domain ones. I've downloaded the SQL dumps from here: http://dumps.wikimedia.org/enwiki/latest/ And studied the DB schema: http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2193px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png I think I understand it but when I pick a sample image from a wikipedia page I can't find it

preg_match_all not returning expected values

倖福魔咒の 提交于 2019-12-12 19:05:55
问题 The code below uses a Wikipedia API to return data. I would like to output a string containing the Industry however cannot understand why the preg_match_all does not match and return the string related to industry - in this example of UBS, I would like to see "industry =[[Banking]], [[Financial services]]" returned. This string can be seen when using print_r to output data. I'm sure I'm misunderstanding or missing something simple. Please assist. <html> <body> <form method="post"> Search: