Regular expressions can be used in very specific, simple cases with HTML. For example, if the text contains only a single tag, you can use "href\\s*=\\s*\"(?
to extract the URL, eg:
var url=Regex.Match(text,"href\\s*=\\s*\"(?.*?)\"").Groups["url"].Value;
This pattern will return :
https://website.com/-id1
This regex doesn't do anything fancy. It looks for href=
with possible whitespace and then captures anything between the first double quote and the next in a non-greedy manner (.*?
). This is captured in the named group url
.
Anything more fancy and things get very complex. For example, supporting both single and double quotes would require special handling to avoid starting on a single and ending on a double quote. The string could multiple tags that used both types of quotes.
For complex parsing it would be better to use a library like AngleSharp or HtmlAgilityPack