Regular Expressions vs XPath when parsing HTML text

后端 未结 4 648
情书的邮戳
情书的邮戳 2021-01-19 18:12

I want to parse a HTML text and find special parts. For example a text in 3rd div of 1st row and 2nd column of a table. I

4条回答
  •  生来不讨喜
    2021-01-19 18:51

    I think XPath is the primary option for traversing XML-like documents. With RegExp, it will be up to you to handle the different forms of writing a tag (with multiple spaces, double quotes, single quotes, no quotes, in one line, in multi-lines, with inner data, without inner data, etc). With XPath, this is all transparent to you, and it has many features (like accessing a node by index, selecting by attribute values, selecting simblings, and MANY others).

    See how powerfull it can be at http://www.w3schools.com/xpath/.

    EDIT: See also How do HTML parses work if they're not using regexp?

提交回复
热议问题