JavaScript Regex to match a URL in a field of text

前端 未结 8 1257
一向
一向 2020-12-01 02:01

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this



        
相关标签:
8条回答
  • 2020-12-01 02:37

    try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

    0 讨论(0)
  • 2020-12-01 02:40

    try this worked for me

    /^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/
    

    that is so simple and understandable

    0 讨论(0)
  • 2020-12-01 02:41

    Try this general regex for many URL format

    /(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w]+@{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%@\.\w]+)?(#[\w]+)?)?/g
    
    0 讨论(0)
  • 2020-12-01 02:43

    Here's the most complete single URL parsing pattern.

    It works with ANY URI/URL in ANY substring!

    https://regex101.com/r/jO8bC4/5

    Example JS code with output - every URL is turned into a 5-part array of its 'parts':

    var re = /([a-z]+\:\/+)([^\/\s]*)([a-z0-9\-@\^=%&;\/~\+]*)[\?]?([^ \#]*)#?([^ \#]*)/ig; 
    var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
    var m;
    
    while ((m = re.exec(str)) !== null) {
        if (m.index === re.lastIndex) {
            re.lastIndex++;
        }
        console.log(m);
    }
    

    Will give you the following:

    ["https://www.facebook.com",
      "https://",
      "www.facebook.com",
      "",
      "",
      ""
    ]
    
    ["https://github.com/justsml?tab=activity#top",
      "https://",
      "github.com",
      "/justsml",
      "tab=activity",
      "top"
    ]
    

    BAM! RegEx FTW!

    0 讨论(0)
  • 2020-12-01 02:47

    Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.

    In addition, \+ and \@ in a character class are indeed interpreted as + and @ respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.

    I would recommend the following regex for your purposes:

    (http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?
    

    this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):

    var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?")
    

    or by directly specifying a regex literal, using the // quoting method:

    var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/
    

    The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.

    I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!

    Edit

    As noted by @noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:

    (http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,@?^=%&amp;:/~+#-]*[\w@?^=%&amp;/~+#-])?
    
    #------changed----here-------------^
    

    <End Edit>

    Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!

    0 讨论(0)
  • 2020-12-01 02:47

    The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.

    0 讨论(0)
提交回复
热议问题