wikipedia-api | 易学教程

MediaWiki URL parameters without values

阅读更多关于 MediaWiki URL parameters without values

问题 The query part of a URL seems to consist of key-value pairs separated by & and associated by = . I've taken to always using jQuery's $.param() function to URL-encode my query strings because I find it makes my code more readable and maintainable. In the past couple of days I find myself calling the MediaWiki API but when cleaning up my working prototype with hard-coded URLs to use $.param() I noticed some MediaWiki APIs include query parameters with keys but not values! api.php ? action=query

Get first lines of Wikipedia Article

阅读更多关于 Get first lines of Wikipedia Article

问题 I got a Wikipedia-Article and I want to fetch the first z lines (or the first x chars, or the first y words, doesn't matter) from the article. The problem: I can get either the source Wiki-Text (via API) or the parsed HTML (via direct HTTP-Request, eventually on the print-version) but how can I find the first lines displayed? Normaly the source (both html and wikitext) starts with the info-boxes and images and the first real text to display is somewhere down in the code. For example: Albert

How to get information in info box of Wikipedia articles using Wikipedia api?

阅读更多关于 How to get information in info box of Wikipedia articles using Wikipedia api?

问题 I'm trying to get lead actor's name from movie's Wikipedia article. I tried different values for prop, prop=info seems most relevant. But this doesn't contain the information in info box of Wikipedia article. See: http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Casino_Royale_(2006_film)&format=jsonfm Is it possible to extract information in infobox using Wikipedia API? 回答1: The MediaWiki API doesn't understand infoboxes. So, you have basically two options: Parse the infobox

How to parse Wikipedia XML with PHP?

阅读更多关于 How to parse Wikipedia XML with PHP?

问题 How to parse Wikipedia XML with PHP? I tried it with simplepie, but I got nothing. Here is a link which I want to get its data. http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content&format=xml Edit code: <?php define("EMAIL_ADDRESS", "youlichika@hotmail.com"); $ch = curl_init(); $cv = curl_version(); $user_agent = "curl ${cv['version']} (${cv['host']}) libcurl/${cv['version']} ${cv['ssl_version']} zlib

Is there any API in Java to access wikipedia data

阅读更多关于 Is there any API in Java to access wikipedia data

问题 I want to know: is there any API or a query interface through which I can access Wikipedia data? 回答1: Mediawiki, the wiki platform that wikipedia uses does have an HTTP based API. See MediaWiki API. For example, to get pages with the title stackoverflow, you call http://en.wikipedia.org/w/api.php?action=query&titles=Stackoverflow There are some (incomplete) Java wrappers around the API - see the Client Code - Java section of the API page for more detail. 回答2: For the use with Java, try http:/

Parsing a Wikipedia dump

阅读更多关于 Parsing a Wikipedia dump

问题 For example using this Wikipedia dump: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm Is there an existing library for Python that I can use to create an array with the mapping of subjects and values? For example: {height_ft,6},{nationality, American} 回答1: It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib. You can use python's built

Wikipedia API + Cross-origin requests

阅读更多关于 Wikipedia API + Cross-origin requests

问题 I'm trying to access wikipedia using javascript+CORS As far as I know, wikipedia should support CORS: http://www.mediawiki.org/wiki/API:Cross-site_requests I tried the following script: create a XMLHttpRequest+credential/XDomainRequest, add some Http-Headers ( "Access-Control-Allow-Credentials",...) and send the query. http://jsfiddle.net/lindenb/Vr7RS/ var WikipediaCORS= { setMessage:function(msg) { var span=document.getElementById("id1"); span.appendChild(document.createTextNode(msg)); }, /

Scrape Data from Wikipedia

阅读更多关于 Scrape Data from Wikipedia

问题 I am trying to find or build a web scraper that is able to go through and find every state/national park in the US along with their GPS coordinates and land area. I have looked into some frameworks like Scrapy and then I see there are some sites that are specifically for Wikipedia such as http://wiki.dbpedia.org/About. Is there any specific advantage to either one of these or would either one work better to load the information into an online database? 回答1: Let's suppose you want to parse

Finding and downloading images within the Wikipedia Dump

阅读更多关于 Finding and downloading images within the Wikipedia Dump

问题 I'm trying to find a comprehensive list of all images on wikipedia, which I can then filter down to the public domain ones. I've downloaded the SQL dumps from here: http://dumps.wikimedia.org/enwiki/latest/ And studied the DB schema: http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2193px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png I think I understand it but when I pick a sample image from a wikipedia page I can't find it

preg_match_all not returning expected values

阅读更多关于 preg_match_all not returning expected values

问题 The code below uses a Wikipedia API to return data. I would like to output a string containing the Industry however cannot understand why the preg_match_all does not match and return the string related to industry - in this example of UBS, I would like to see "industry =[[Banking]], [[Financial services]]" returned. This string can be seen when using print_r to output data. I'm sure I'm misunderstanding or missing something simple. Please assist. <html> <body> <form method="post"> Search: