How do I parse HTML using regular expressions in C#?
For example, given HTML code
t1 sp
I used this regx in C#, and it works. Thanks for all your answers.
<([^<]*)>|([^<]*)
This has already been answered literally dozens of times, but it bears repeating: regular expressions can only parse regular languages, that's why they are called regular expressions. HTML is not a regular language (as probably every college student in the last decade has proved at least once), and therefore cannot be parsed by regular expressions.
you might want to simply use string functions. make < and > as your indicator for parsing.
Regular expressions are a very poor way to parse HTML. If you can guarantee that your input will be well-formed XML (i.e. XHTML), you can use XmlReader to read the elements and then print them out however you like.
You might want to try the Html Agility Pack, http://www.codeplex.com/htmlagilitypack. It even handles malformed HTML.