How to extract data from a Wikipedia article?

前端 未结 2 858
时光说笑
时光说笑 2021-02-03 14:09

I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/

相关标签:
2条回答
  • 2021-02-03 14:50

    action=parse doesn't work well with per-section parse, consider this shoert example:

    Foo is a bar<ref>really!</ref>
    == References ==
    <references/>
    

    Parsing just the zeroth section will result in red error message about without while parsing the first one will result in empty references list.

    However, there's a better solution: action=mobileview is not only free from this problem, but it's also specifically intended for mobile apps and gives you mobile-optimized HTML.

    0 讨论(0)
  • 2021-02-03 14:52

    Unfortunatelly, it seems the mediawiki.org documentation for parse doesn't tell you how to do this. But the documentation in the API itself does: You can use section parameter. And you can use prop=sections to get the list of sections.

    So, you could first use:

    http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=sections

    to get the list of sections and then

    http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=text&section=26

    to get the HTML for a certain section.

    0 讨论(0)
提交回复
热议问题