Regex to match URL

后端 未结 14 2121
耶瑟儿~
耶瑟儿~ 2020-11-22 09:41

I am using the following regex to match a URL:

$search  = \"/([\\S]+\\.(MUSEUM|TRAVEL|AERO|ARPA|ASIA|COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|AC|A         


        
相关标签:
14条回答
  • 2020-11-22 09:48

    Try Regexy::Web::Url

    r = Regexy::Web::Url.new # matches 'http://foo.com', 'www.foo.com' and 'foo.com'

    0 讨论(0)
  • 2020-11-22 09:49
    $search  = "#^((?#
        the scheme:
      )(?:https?://)(?#
        second level domains and beyond:
      )(?:[\S]+\.)+((?#
        top level domains:
      )MUSEUM|TRAVEL|AERO|ARPA|ASIA|EDU|GOV|MIL|MOBI|(?#
      )COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|(?#
      )A[CDEFGILMNOQRSTUWXZ]|B[ABDEFGHIJLMNORSTVWYZ]|(?#
      )C[ACDFGHIKLMNORUVXYZ]|D[EJKMOZ]|(?#
      )E[CEGHRSTU]|F[IJKMOR]|G[ABDEFGHILMNPQRSTUWY]|(?#
      )H[KMNRTU]|I[DELMNOQRST]|J[EMOP]|(?#
      )K[EGHIMNPRWYZ]|L[ABCIKRSTUVY]|M[ACDEFGHKLMNOPQRSTUVWXYZ]|(?#
      )N[ACEFGILOPRUZ]|OM|P[AEFGHKLMNRSTWY]|QA|R[EOSUW]|(?#
      )S[ABCDEGHIJKLMNORTUVYZ]|T[CDFGHJKLMNOPRTVWZ]|(?#
      )U[AGKMSYZ]|V[ACEGINU]|W[FS]|Y[ETU]|Z[AMW])(?#
        the path, can be there or not:
      )(/[a-z0-9\._/~%\-\+&\#\?!=\(\)@]*)?)$#i";
    

    Just cleaned up a bit. This will match only HTTP(s) addresses, and, as long as you copied all top level domains correctly from IANA, only those standardized (it will not match http://localhost) and with the http:// declared.

    Finally you should end with the path part, that will always start with a /, if it is there.

    However, I'd suggest to follow Cerebrus: If you're not sure about this, learn regexps in a more gentle way and use proven patterns for complicated tasks.

    Cheers,

    By the way: Your regexp will also match something.r and something.h (between |TO| and |TR| in your example). I left them out in my version, as I guess it was a typo.

    On re-reading the question: Change

      )(?:https?://)(?#
    

    to

      )(?:https?://)?(?#
    

    (there is a ? extra) to match 'URLs' without the scheme.

    0 讨论(0)
  • 2020-11-22 09:49

    $ : The dollar signifies the end of the string.
    For example \d*$ will match strings which end with a digit. So you need to add the $!

    0 讨论(0)
  • 2020-11-22 09:50

    This question was surprisingly difficult to find an answer for. The regexes I found were too complicated to understand, and anything more that a regex is overkill and too difficult to implement.

    Finally came up with:

    /(\S+\.(com|net|org|edu|gov)(\/\S+)?)/
    

    Works with http://example.com, https://example.com, example.com, http://example.com/foo.

    Explanation:

    • Looks for .com, etc.
    • Matches everything before it up to the space
    • Matches everything after it up to the space
    0 讨论(0)
  • 2020-11-22 09:55

    Changing the end of the regex to (/\S*)?)$ should solve your problem.

    To explain what that is doing -

    • it is looking for / followed by some characters (not whitespace)
    • this match is optional, ? indicated 0 or 1 times
    • and finally it should be followed by a end of string (or change it to \b for matching on a word boundary).
    0 讨论(0)
  • 2020-11-22 09:55

    Just to add to things. I know this doesn't fully and directly answer this specific question, but it's the best place I can find to add this info. I wrote a jQuery plug a while back to match urls for similar purpose, however at current state (will update it as time goes on) it will still consider addresses like 'http://abc.php' as valid. However, if there is no http, https, or ftp at url start, it will not return 'valid'. Though I should clarify, this jQuery method returns an object and not just one string or boolean. The object breaks things down and among the breakdown is a .valid boolean. See the full fiddle and test in the link at bottom. If you simply wanna grab the plugin and go, see below:

    jQuery Plugin

    (function($){$.matchUrl||$.extend({matchUrl:function(c){var b=void 0,d="url,,scheme,,authority,path,,query,,fragment".split(","),e=/^(([^\:\/\?\#]+)\:)?(\/\/([^\/\?\#]*))?([^\?\#]*)(\?([^\#]*))?(\#(.*))?/,a={url:void 0,scheme:void 0,authority:void 0,path:void 0,query:void 0,fragment:void 0,valid:!1};"string"===typeof c&&""!=c&&(b=c.match(e));if("object"===typeof b)for(x in b)d[x]&&""!=d[x]&&(a[d[x]]=b[x]);a.scheme&&a.authority&&(a.valid=!0);return a}});})(jQuery);
    

    jsFiddle with example:

    http://jsfiddle.net/SpYk3/e4Ank/

    0 讨论(0)
提交回复
热议问题