XPath to get markup between two headings

前端 未结 2 1582
野性不改
野性不改 2021-01-24 01:17

I am trying to write a small application to extract content from Wikipedia pages. When I first thought if it, I thought that I could just target divs containing content with XPa

2条回答
  •  一整个雨季
    2021-01-24 01:59

    With the help from kjhughes suggestion, I managed to get the code working.

    I was unable to make the = 'Text' part work, but replaced it with [text() = 'text']

    That alone wasn't enough, as the title of the content I need is location inside a span in a h2 tag, so I had to adapt the XPath a bit more.

    This is what I came up with:

    //*[preceding-sibling::h2::following-sibling::span[text() = 'Summary'] and following-sibling::h2::following-sibling::span[text() = 'Location']]
    

    I tested it using http://www.xpathtester.com/xpath on this HTML:

    Summary

    Paragraph

    • List1
    • List2
    • List3

    Paragraph

    Location

    Paragraph

    Which gave me the following result:

    Paragraph

    • List1
    • List2
    • List3

    Paragraph

提交回复
热议问题