I am using Python and need to find and retrieve all character data between tags:
I need this stuff
I then want to output
Use xpath and lxml;
from lxml import etree
pageInMemory = open("pageToParse.html", "r")
parsedPage = etree.HTML(pageInMemory)
yourListOfText = parsedPage.xpath("//tag//text()")
saveFile = open("savedFile", "w")
saveFile.writelines(yourListOfText)
pageInMemory.close()
saveFile.close()
Faster than Beautiful soup.
If you want to test out your Xpath's - I find FireFox's Xpather extremely helpful.
Further Notes: