问题
This is the question I asked yesterday. I was able to get the required data. The final data is like this. Please follow this link.
I tried with the following code to get all the infobox data
content = content.split("}}\n");
for(k in content)
{
if(content[k].search("Infobox")==2)
{
var infobox = content[k];
alert(infobox);
infobox = infobox.replace("{{","");
alert(infobox);
infobox = infobox.split("\n|");
//alert(infobox[0]);
var infohtml="";
for(l in infobox)
{
if(infobox[l].search("=")>0)
{
var line = infobox[l].split("=");
infohtml = infohtml+"<tr><td>"+line[0]+"</td><td>"+line[1]+"</td></tr>";
}
}
infohtml="<table>"+infohtml+"</table>";
$('#con').html(infohtml);
break;
}
}
I initially thought each element is enclosed in {{ }}. So I wrote this code. But what I see is, I was not able to get the entire infobox data with this. There is this element
{{Sfn|National Informatics Centre|2005}}
occuring which ends my infobox data.
It seems to be far simpler without using json. Please help me
回答1:
Have you tried DBpedia? Afaik they provide template usage information. There is also a toolserver tool named Templatetiger, which does template extraction from the static dumps (not live).
However, I once wrote a tiny snippet to extract templates from wikitext in javascript:
var title; // of the template
var wikitext; // of the page
var templateRegexp = new RegExp("{{\\s*"+(title.indexOf(":")>-1?"(?:Vorlage:|Template:)?"+title:title)+"([^[\\]{}]*(?:{{[^{}]*}}|\\[?\\[[^[\\]]*\\]?\\])?[^[\\]{}]*)+}}", "g");
var paramRegexp = /\s*\|[^{}|]*?((?:{{[^{}]*}}|\[?\[[^[\]]*\]?\])?[^[\]{}|]*)*/g;
wikitext.replace(templateRegexp, function(template){
// logabout(template, "input ");
var parameters = template.match(paramRegexp);
if (!parameters) {
console.log(page.title + " ohne Parameter:\n" + template);
parameters = [];
}
var unnamed = 1;
var p = parameters.reduce(function(map, line) {
line = line.replace(/^\s*\|/,"");
var i = line.indexOf("=");
map[line.substr(0,i).trim() || unnamed++] = line.substr(i+1).trim();
return map;
}, {});
// you have an object "p" in here containing the template parameters
});
It features one-level nested templates, but still is very error-prone. Parsing wikitext with regexp is as evil as trying to do it on html :-)
It may be easier to query the parse-tree from the api: api.php?action=query&prop=revisions&rvprop=content&rvgeneratexml=1&titles=.... From that parsetree you will be able to extract the templates easily.
来源:https://stackoverflow.com/questions/10207480/wikimedia-api-getting-relavant-data-from-json-string