Don't use regex to parse HTML(as @hsz mentioned). See why: RegEx match open tags except XHTML self-contained tags. Instead of it you could use HTML parser like HtmlAgilityPack for this:
var html = @"<a href=""http://msdn.microsoft.com/en-us/library/Aa538627.aspx"" onclick=""trackClick(this, '117', 'http\x3a\x2f\x2fmsdn.microsoft.com\x2fen-us\x2flibrary\x2fAa538627.aspx', '15');"">ToolStripItemOwnerCollectionUIAdapter.GetInsertingIndex Method ...</a>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var link = document.DocumentNode.SelectSingleNode("//a");
if (link != null)
{
var href = link.Attributes["href"].Value;
var innerText = link.InnerText;
}
Now href
contains http://msdn.microsoft.com/en-us/library/Aa538627.aspx
; innerText
(AKA the string between tags) contains ToolStripItemOwnerCollectionUIAdapter.GetInsertingIndex Method ...
.
Isn't it easier than regex?