lxml.etree, element.text doesn't return the entire text from an element

后端 未结 8 819
梦毁少年i
梦毁少年i 2021-02-07 10:39

I scrapped some html via xpath, that I then converted into an etree. Something similar to this:

 text1  link  text2 
<         


        
相关标签:
8条回答
  • 2021-02-07 11:10

    Another thing that seems to be working well to get the text out of an element is "".join(element.itertext())

    0 讨论(0)
  • 2021-02-07 11:10
    <td> text1 <a> link </a> text2 </td>
    

    Here's how it is (ignoring whitespace):

    td.text == 'text1'
    a.text == 'link'
    a.tail == 'text2'
    

    If you don't want a text that is inside child elements then you could collect only their tails:

    text = td.text + ''.join([el.tail for el in td])
    
    0 讨论(0)
提交回复
热议问题