C# regex pattern to extract urls from given string - not full html urls but bare links as well

前端 未结 1 1575
一个人的身影
一个人的身影 2020-12-01 02:16

I need a regex which will do the following

Extract all strings which starts with http://
Extract all strings which starts with www.

So i ne

相关标签:
1条回答
  • 2020-12-01 02:51

    You can write some pretty simple regular expressions to handle this, or go via more traditional string splitting + LINQ methodology.

    Regex

    var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
    var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
    foreach(Match m in linkParser.Matches(rawString))
        MessageBox.Show(m.Value);
    

    Explanation Pattern:

    \b       -matches a word boundary (spaces, periods..etc)
    (?:      -define the beginning of a group, the ?: specifies not to capture the data within this group.
    https?://  - Match http or https (the '?' after the "s" makes it optional)
    |        -OR
    www\.    -literal string, match www. (the \. means a literal ".")
    )        -end group
    \S+      -match a series of non-whitespace characters.
    \b       -match the closing word boundary.
    

    Basically the pattern looks for strings that start with http:// OR https:// OR www. (?:https?://|www\.) and then matches all the characters up to the next whitespace.

    Traditional String Options

    var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
    var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
    foreach (string s in links)
        MessageBox.Show(s);
    
    0 讨论(0)
提交回复
热议问题