Regular expression to find URLs within a string

前端 未结 27 1768
被撕碎了的回忆
被撕碎了的回忆 2020-11-22 14:18

Does anyone know of a regular expression I could use to find URLs within a string? I\'ve found a lot of regular expressions on Google for determining if an entire string is

相关标签:
27条回答
  • 2020-11-22 14:48

    Matching a URL in a text should not be so complex

    (?:(?:(?:ftp|http)[s]*:\/\/|www\.)[^\.]+\.[^ \n]+)

    https://regex101.com/r/wewpP1/2

    0 讨论(0)
  • 2020-11-22 14:48

    I used this

    ^(https?:\\/\\/([a-zA-z0-9]+)(\\.[a-zA-z0-9]+)(\\.[a-zA-z0-9\\/\\=\\-\\_\\?]+)?)$
    
    0 讨论(0)
  • 2020-11-22 14:49

    I used below regular expression to find url in a string:

    /(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/
    
    0 讨论(0)
  • 2020-11-22 14:49

    This is a slight improvement on/adjustment to (depending on what you need) Rajeev's answer:

    ([\w\-_]+(?:(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&:/~\+#]*[A-Z\-\@?^=%&/~\+#]){2,6}?
    

    See here for an example of what it does and does not match.

    I got rid of the check for "http" etc as I wanted to catch url's without this. I added slightly to the regex to catch some obfuscated urls (i.e. where user's use [dot] instead of a "."). Finally I replaced "\w" with "A-Z" to and "{2,3}" to reduce false positives like v2.0 and "moo.0dd".

    Any improvements on this welcome.

    0 讨论(0)
  • 2020-11-22 14:50
    text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
    Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd
    The code below catches all urls in text and returns urls in list."""
    
    urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
    print(urls)
    

    Output:

    [
        'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 
        'www.google.com', 
        'facebook.com',
        'http://test.com/method?param=wasd'
    ]
    
    0 讨论(0)
  • 2020-11-22 14:50

    I think this regex pattern handle precisely what you want

    /(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/
    

    and this is an snippet example to extract Urls:

    // The Regular Expression filter
    $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
    
    // The Text you want to filter for urls
    $text = "The text you want  https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string to filter goes here.";
    
    // Check if there is a url in the text
    preg_match_all($reg_exUrl, $text, $url,$matches);
    var_dump($matches);
    
    0 讨论(0)
提交回复
热议问题