I scrapped some html via xpath, that I then converted into an etree. Something similar to this:
text1 link text2
<
Another thing that seems to be working well to get the text out of an element is "".join(element.itertext())
<td> text1 <a> link </a> text2 </td>
Here's how it is (ignoring whitespace):
td.text == 'text1'
a.text == 'link'
a.tail == 'text2'
If you don't want a text that is inside child elements then you could collect only their tails:
text = td.text + ''.join([el.tail for el in td])