XPath to get markup between two headings

前端未结

关注

 2  1581

野性不改 2021-01-24 01:17

I am trying to write a small application to extract content from Wikipedia pages. When I first thought if it, I thought that I could just target divs containing content with XPa

2条回答

迷失自我 (楼主)

2021-01-24 01:46

Yes, you're on the right track with XPath -- it's ideal for selecting parts of an XML document.

For example, for this XML,


   Title A
   Some Content
   More Content
   Title B

this XPath,

//div[preceding-sibling::h2 = 'Title A' and following-sibling::h2 = 'Title B']

will select this content,

Some Content
More Content

between the two h2 titles, as requested.

Update to address OP's self-answer:

For this new XML example,


    Summary
    Paragraph
    
        List1
        List2
        List3
    
    Paragraph

    Location
    Paragraph

the XPath I provided above can easily be adapted,

//*[preceding-sibling::h2 = 'Summary' and following-sibling::h2 = 'Location']

to select this XML,

Paragraph  

   List1
   List2
   List3
    
Paragraph

as requested.

0 讨论(0)

查看其它2个回答