I am trying to write a small application to extract content from Wikipedia pages. When I first thought if it, I thought that I could just target divs containing content with XPa
With the help from kjhughes suggestion, I managed to get the code working.
I was unable to make the = 'Text'
part work, but replaced it with [text() = 'text']
That alone wasn't enough, as the title of the content I need is location inside a span
in a h2
tag, so I had to adapt the XPath a bit more.
This is what I came up with:
//*[preceding-sibling::h2::following-sibling::span[text() = 'Summary'] and following-sibling::h2::following-sibling::span[text() = 'Location']]
I tested it using http://www.xpathtester.com/xpath on this HTML:
Summary
Paragraph
- List1
- List2
- List3
Paragraph
Location
Paragraph
Which gave me the following result:
Paragraph
- List1
- List2
- List3
Paragraph