HTML Agility Pack (c#)
- XPath is borked, the way the html is cleaned to make it xml compliant it will drop tags and you have to adjust the expression to get it to work.
- simple to use
Mozilla Parser (Java)
Solid XPath support
you have to set enviroment variables before it will work which is a pain
casting between org.dom4j.Node and org.w3c.dom.Node to get different properties is a real pain
dies on non-standard html (0.3 fixes this)
best solution for XPath
problems accessing data on Nodes in a NodeList
use a for(int i=1;i<=list_size;i++) to get around that
Beautiful Soup (Python)
I don't have much experience but here's what I've found
- no XPath support
- nice interface to pathing html
I prefer Mozilla HTML Parser