lxml.etree, element.text doesn't return the entire text from an element

后端 未结 8 847
梦毁少年i
梦毁少年i 2021-02-07 10:39

I scrapped some html via xpath, that I then converted into an etree. Something similar to this:

 text1  link  text2 
<         


        
8条回答
  •  情歌与酒
    2021-02-07 11:10

     text1  link  text2 
    

    Here's how it is (ignoring whitespace):

    td.text == 'text1'
    a.text == 'link'
    a.tail == 'text2'
    

    If you don't want a text that is inside child elements then you could collect only their tails:

    text = td.text + ''.join([el.tail for el in td])
    

提交回复
热议问题