I use VB.NET and would like to add http://
to all links that doesn\'t already start with http://, https://, ftp:// and so on.
\"I want to add http h
In PHP (should translate somewhat easily)
$text = preg_replace('/href="(?:(http|ftp|https)\:\/\/)?([^"]*)"/', 'href="http://$1"', $text);
If you aren't concerned with potentially messing up local links, and you can always guarantee that the strings will be fully qualified domain names, then you can simply use the contains method:
Dim myUrl as string = "someUrlString".ToLower()
If Not myUrl.Contains("http://") AndAlso Not myUrl.Contains("https://") AndAlso Not myUrl.Contains("ftp://") Then
'Execute your logic to prepend the proper protocol
myUrl = "http://" & myUrl
End If
Keep in mind this omits a lot of holes regarding the checking of which protocol should be used in the addition and if the url is relative or not.
Edit: I chose specifically not to offer a RegEx solution since this is a simple check and RegEx is a little heavy for it (IMO).
Quote RFC 1738:
"Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http")."
Excellent! A regex to match:
/^[a-zA-Z0-9+.-]+:\/\//
If that matches your href string, continue on. If not, prepend "http://". Remaining sanity checks are yours unless you ask for specific details. Do note the other commenters' thoughts about relative links.
EDIT: I'm starting to suspect that you've asked the wrong question... that you perhaps don't have anything that splits the text up into the individual tokens you need to handle it. See Looking for C# HTML parser
EDIT: As a blind try at ignoring all and just attacking the text, using case insensitive matching,
/(<a +href *= *")(.*?)(" *>)/
If the second back-reference matches /^[a-zA-Z0-9+.-]+:\/\//
, do nothing. If it does not match, replace it with
$1 + "http://" + $2 + $3
This isn't C# syntax, but it should translate across without too much effort.
C#
result = new Regex("(href=\")([^(http|https|ftp)])", RegexOptions.IgnoreCase).Replace(input, "href=\"//$2");