I just found out an example of regexp beating html parser. I needed to extract some information from a long page (8231 lines, 400kb) and I first tried using simple_html_dom. Since I got stuck due to the problem reported in this question, I went for the alternative approach and I realized that I actually only needed informations contained in the first 416 lines of that file (~4% of the total) and loading the whole DOM into memory looked like a huge waste of resources.
Now I still don't know why simplehtmldom is failing on that, so I can't really compare the performance of the two solutions, but the regexp version only loads as many lines as needed (up to the end of the <ul>
I'm interested in and no more) and is very quick.