I scrapped some html via xpath, that I then converted into an etree. Something similar to this:
text1 link text2 <
def get_text_recursive(node): return (node.text or '') + ''.join(map(get_text_recursive, node)) + (node.tail or '')