发表新帖

发表新帖

Regular Expressions vs XPath when parsing HTML text

后端未结

关注

 4  648

情书的邮戳 2021-01-19 18:12

I want to parse a HTML text and find special parts. For example a text in 3rd div of 1st row and 2nd column of a table. I

4条回答

生来不讨喜 (楼主)

2021-01-19 18:51

I think XPath is the primary option for traversing XML-like documents. With RegExp, it will be up to you to handle the different forms of writing a tag (with multiple spaces, double quotes, single quotes, no quotes, in one line, in multi-lines, with inner data, without inner data, etc). With XPath, this is all transparent to you, and it has many features (like accessing a node by index, selecting by attribute values, selecting simblings, and MANY others).

See how powerfull it can be at http://www.w3schools.com/xpath/.

EDIT: See also How do HTML parses work if they're not using regexp?

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题