问题
Is it possible to fetch information from Wikipedia API by movies category? e.g I've a url which search avatar but I don't know how to search avatar movie.
https://en.wikipedia.org/w/api.php?&titles=avatar&format=xml&action=query&prop=extracts|categories|categoryinfo|pageterms|pageprops|pageimages&exintro=&explaintext=&cllimit=max&piprop=original
回答1:
It will not be easy by "movies category" because there are a lot of nested categories, but you can use something else - all articles about movie include within themselves Template:Infobox film, and we can get all them by MediaWiki API:
https://en.wikipedia.org/w/api.php?format=xml&action=query&list=embeddedin&einamespace=0&eilimit=500&eititle=Template:Infobox_film
Then, you decide how will search in them - by regex, Contains()
or StartsWith()
, CaseInsensitive
or not, will return first found or all matches, etc...
Here is an example in C# for all movie articles which title starts with "Avatar":
var articles = GetMovies("Avatar");
...
private static List<string> GetMovies(string word)
{
var api = "https://en.wikipedia.org/w/api.php?format=xml&action=query&list=embeddedin&" +
"einamespace=0&eilimit=500&eititle=Template:Infobox film";
var articles = new List<string>();
var next = string.Empty;
while (true)
{
using (var response = (HttpWebResponse)WebRequest.Create(api + next).GetResponse())
{
using (var reader = new StreamReader(response.GetResponseStream()))
{
var xElement = XElement.Parse(reader.ReadToEnd());
articles.AddRange(xElement.Descendants("ei")
.Select(x => x.Attribute("title").Value)
.Where(x => Regex.IsMatch(x, "^" + word + "\\b")));
var cont = xElement.Element("continue");
if (cont == null) break;
next = "&eicontinue=" + cont.Attribute("eicontinue").Value;
}
}
}
return articles;
}
This will returns:
Avatar (2009 film)
Avatar (2004 film)
Avatar (1916 film)
来源:https://stackoverflow.com/questions/34861428/how-to-get-information-from-movies-wikipedia-category-by-api