Writing an HTML Parser
问题 I am currently attempting (or planning to attempt) to write a simple (as possible) program to parse an html document into a tree. After googling I have found many answers saying "don't do it it's been done" (or words to that effect); and references to examples of HTML parsers; and also a rather emphatic article on why one shouldn't use Regular expresions. However I haven't found any guides on the "right" way to write a parser. (This, by the way, is something I'm attempting more as a learning