How do I parse HTML using regular expressions in C#?

前端 未结 5 1309
野的像风
野的像风 2021-01-28 13:49

How do I parse HTML using regular expressions in C#?

For example, given HTML code

 t1      sp         


        
相关标签:
5条回答
  • 2021-01-28 14:08

    I used this regx in C#, and it works. Thanks for all your answers.

    <([^<]*)>|([^<]*)
    
    0 讨论(0)
  • 2021-01-28 14:10

    This has already been answered literally dozens of times, but it bears repeating: regular expressions can only parse regular languages, that's why they are called regular expressions. HTML is not a regular language (as probably every college student in the last decade has proved at least once), and therefore cannot be parsed by regular expressions.

    0 讨论(0)
  • 2021-01-28 14:17

    you might want to simply use string functions. make < and > as your indicator for parsing.

    0 讨论(0)
  • 2021-01-28 14:27

    Regular expressions are a very poor way to parse HTML. If you can guarantee that your input will be well-formed XML (i.e. XHTML), you can use XmlReader to read the elements and then print them out however you like.

    0 讨论(0)
  • 2021-01-28 14:31

    You might want to try the Html Agility Pack, http://www.codeplex.com/htmlagilitypack. It even handles malformed HTML.

    0 讨论(0)
提交回复
热议问题