Regular expression to find URLs within a string

前端 未结 27 1766
被撕碎了的回忆
被撕碎了的回忆 2020-11-22 14:18

Does anyone know of a regular expression I could use to find URLs within a string? I\'ve found a lot of regular expressions on Google for determining if an entire string is

相关标签:
27条回答
  • 2020-11-22 14:50

    I use the logic of finding text between two dots or periods

    the regex below works fine with python

    (?<=\.)[^}]*(?=\.)
    
    0 讨论(0)
  • 2020-11-22 14:55

    None of the solutions provided here solved the problems/use-cases I had.

    What I have provided here, is the best I have found/made so far. I will update it when I find new edge-cases that it doesn't handle.

    \b
      #Word cannot begin with special characters
      (?<![@.,%&#-])
      #Protocols are optional, but take them with us if they are present
      (?<protocol>\w{2,10}:\/\/)?
      #Domains have to be of a length of 1 chars or greater
      ((?:\w|\&\#\d{1,5};)[.-]?)+
      #The domain ending has to be between 2 to 15 characters
      (\.([a-z]{2,15})
           #If no domain ending we want a port, only if a protocol is specified
           |(?(protocol)(?:\:\d{1,6})|(?!)))
    \b
    #Word cannot end with @ (made to catch emails)
    (?![@])
    #We accept any number of slugs, given we have a char after the slash
    (\/)?
    #If we have endings like ?=fds include the ending
    (?:([\w\d\?\-=#:%@&.;])+(?:\/(?:([\w\d\?\-=#:%@&;.])+))*)?
    #The last char cannot be one of these symbols .,?!,- exclude these
    (?<![.,?!-])
    
    0 讨论(0)
  • 2020-11-22 14:55

    If you have the url pattern, you should be able to search for it in your string. Just make sure that the pattern doesnt have ^ and $ marking beginning and end of the url string. So if P is the pattern for URL, look for matches for P.

    0 讨论(0)
  • 2020-11-22 14:55
    (?:vnc|s3|ssh|scp|sftp|ftp|http|https)\:\/\/[\w\.]+(?:\:?\d{0,5})|(?:mailto|)\:[\w\.]+\@[\w\.]+
    

    If you want an explanation of each part, try in regexr[.]com where you will get a great explanation of every character.

    This is split by an "|" or "OR" because not all useable URI have "//" so this is where you can create a list of schemes as or conditions that you would be interested in matching.

    0 讨论(0)
  • 2020-11-22 14:55

    It is just simple.

    Use this pattern: \b((ftp|https?)://)?([\w-\.]+\.(com|net|org|gov|mil|int|edu|info|me)|(\d+\.\d+\.\d+\.\d+))(:\d+)?(\/[\w-\/]*(\?\w*(=\w+)*[&\w-=]*)*(#[\w-]+)*)?

    It matches any link contains:

    Allowed Protocols: http, https and ftp

    Allowed Domains: *.com, *.net, *.org, *.gov, *.mil, *.int, *.edu, *.info and *.me OR IP

    Allowed Ports: true

    Allowed Parameters: true

    Allowed Hashes: true

    0 讨论(0)
  • 2020-11-22 14:56

    Here a little bit more optimized regexp:

    (?:(?:(https?|ftp|file):\/\/|www\.|ftp\.)|([\w\-_]+(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&amp;:\/~\+#]*[A-Z\-\@?^=%&amp;\/~\+#]){2,6}?
    

    Here is test with data: https://regex101.com/r/sFzzpY/6

    0 讨论(0)
提交回复
热议问题