Detect URLs in text with JavaScript

孤城傲影 2020-11-22 06:23

Does anyone have suggestions for detecting URLs in a set of strings?

  // detect URLs in strings and do something sw         

    2020-11-22 07:05

    I googled this problem for quite a while, then it occurred to me that there is an Android method, android.text.util.Linkify, that utilizes some pretty robust regexes to accomplish this. Luckily, Android is open source.

    They use a few different patterns for matching different types of urls. You can find them all here:

    If you're just concerned about url's that match the WEB_URL_PATTERN, that is, urls that conform to the RFC 1738 spec, you can use this:


    Here is the full text of the source:

    + "\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_"
    + "\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?"
    + "((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+"   // named host
    + "(?:"   // plus top level domain
    + "(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])"
    + "|(?:biz|b[abdefghijmnorstvwyz])"
    + "|(?:cat|com|coop|c[acdfghiklmnoruvxyz])"
    + "|d[ejkmoz]"
    + "|(?:edu|e[cegrstu])"
    + "|f[ijkmor]"
    + "|(?:gov|g[abdefghilmnpqrstuwy])"
    + "|h[kmnrtu]"
    + "|(?:info|int|i[delmnoqrst])"
    + "|(?:jobs|j[emop])"
    + "|k[eghimnrwyz]"
    + "|l[abcikrstuvy]"
    + "|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])"
    + "|(?:name|net|n[acefgilopruz])"
    + "|(?:org|om)"
    + "|(?:pro|p[aefghklmnrstwy])"
    + "|qa"
    + "|r[eouw]"
    + "|s[abcdeghijklmnortuvyz]"
    + "|(?:tel|travel|t[cdfghjklmnoprtvwz])"
    + "|u[agkmsyz]"
    + "|v[aceginu]"
    + "|w[fs]"
    + "|y[etu]"
    + "|z[amw]))"
    + "|(?:(?:25[0-5]|2[0-4]" // or ip address
    + "[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]"
    + "|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]"
    + "[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}"
    + "|[1-9][0-9]|[0-9])))"
    + "(?:\\:\\d{1,5})?)" // plus option port number
    + "(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~"  // plus option query params
    + "\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?"
    + "(?:\\b|$)";

    If you want to be really fancy, you can test for email addresses as well. The regex for email addresses is:


    PS: The top level domains supported by above regex are current as of June 2007. For an up to date list you'll need to check
