Alex Hovansky's answer is good enough, although there is a chance that html is not well formed and your xml_grep would crash
I recommend use tidy to convert html to xml, then use xml_grep
tidy -asxml -utf8 html_file.html > out.xml
xml_grep 'xpath_expression' out.xml