XPath to get markup between two headings

前端 未结 2 1581
野性不改
野性不改 2021-01-24 01:17

I am trying to write a small application to extract content from Wikipedia pages. When I first thought if it, I thought that I could just target divs containing content with XPa

2条回答
  •  迷失自我
    2021-01-24 01:46

    Yes, you're on the right track with XPath -- it's ideal for selecting parts of an XML document.

    For example, for this XML,

    
       

    Title A

    Some Content
    More Content

    Title B

    this XPath,

    //div[preceding-sibling::h2 = 'Title A' and following-sibling::h2 = 'Title B']
    

    will select this content,

    Some Content
    More Content

    between the two h2 titles, as requested.


    Update to address OP's self-answer:

    For this new XML example,

    Summary

    Paragraph

    • List1
    • List2
    • List3

    Paragraph

    Location

    Paragraph

    the XPath I provided above can easily be adapted,

    //*[preceding-sibling::h2 = 'Summary' and following-sibling::h2 = 'Location']
    

    to select this XML,

    Paragraph

    • List1
    • List2
    • List3

    Paragraph

    as requested.

提交回复
热议问题