When is it wise to use regular expressions with HTML? [closed]

后端未结

关注

 10  895

再見小時候

相关标签:

10条回答

南方客

2020-12-10 16:22

You can use regexp when either you parse HTML you have control over or you are writing a parser for one specific HTML page. You should not use regexp when trying to build universal parser.

0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2020-12-10 16:24

If the set of HTML you're looking to parse with a regexp is known to conform to some sort of pattern. e.g. if you know there's no commented-out HTML, or complex scenarios etc.

e.g. I often preach that you shouldn't use regexps for HTML, but if I have a set of HTML that I'm familiar with, is straightforward and that I can check easily post-manipulation, then I have no qualms about using a regexp for that.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-10 16:24

I think the best answer here is: regular expressions are the right tool except for when they aren't.

I think if you can cleanly and effectively solve your problem using regex, then go for it. But i've seen far too many regex hacks because the programmer / web designer is just plain lazy.

Regex is powerful and one of the best tools a programmer can learn, but you also need to learn when to use it and when to use something different.

0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-10 16:27

I just found out an example of regexp beating html parser. I needed to extract some information from a long page (8231 lines, 400kb) and I first tried using simple_html_dom. Since I got stuck due to the problem reported in this question, I went for the alternative approach and I realized that I actually only needed informations contained in the first 416 lines of that file (~4% of the total) and loading the whole DOM into memory looked like a huge waste of resources.

Now I still don't know why simplehtmldom is failing on that, so I can't really compare the performance of the two solutions, but the regexp version only loads as many lines as needed (up to the end of the <ul> I'm interested in and no more) and is very quick.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题