Find next siblings until a certain one using beautifulsoup

前端 未结 2 419
醉梦人生
醉梦人生 2020-12-19 03:37

The webpage is something like this:

section1

article

article

article

相关标签:
2条回答
  • 2020-12-19 04:05

    The next_siblings iterator can be helpful here as well:

    for i in soup.find_all('h2'):
        for sib in i.next_siblings:
            if sib.name == 'p':
                print(sib.text)
            elif sib.name == 'h2':
                print ("*****")
                break
    
    0 讨论(0)
  • 2020-12-19 04:14

    I think you can do something like this:

    for section in soup.findAll('h2'):
        nextNode = section
        while True:
            nextNode = nextNode.nextSibling
            try:
                tag_name = nextNode.name
            except AttributeError:
                tag_name = ""
            if tag_name == "p":
                print nextNode.string
            else:
                print "*****"
                break
    

    Given:

    <h2>section1</h2>
    <p>article1</p>
    <p>article2</p>
    <p>article3</p>
    
    <h2>section2</h2>
    <p>article4</p>
    <p>article5</p>
    <p>article6</p>
    

    Output:

    article1
    article2
    article3
    *****
    article4
    article5
    article6
    *****
    
    0 讨论(0)
提交回复
热议问题