I\'m looking for a .NET regular expression extract all the URLs from a webpage but haven\'t found one to be comprehensive enough to cover all the different ways you can spec
All HTTP's and MAILTO's
(["'])(mailto:|http:).*?\1
All links, including relative ones, that are called by href or src.
#Matches things in single or double quotes, but not the quotes themselves
(?<=(["']))((?<=href=['"])|(?<=src=['"])).*?(?=\1)
#Maches thing in either double or single quotes, including the quotes.
(["'])((?<=href=")|(?<=src=")).*?\1
The second one will only get you links that use double quotes, however.
URL's? As in images/scripts/css/etc.?
%href="(.["]*)"%
from the RegexBuddy library:
The final character class makes sure that if an URL is part of some text, punctuation such as a comma or full stop after the URL is not interpreted as part of the URL.
\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]