How to check if a URL is valid

后端 未结 9 1131
自闭症患者
自闭症患者 2020-11-28 04:23

How can I check if a string is a valid URL?

For example:

http://hello.it => yes
http:||bra.ziz, => no

If this is a valid URL

相关标签:
9条回答
  • 2020-11-28 04:34

    The problem with the current answers is that a URI is not an URL.

    A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").

    Since URLs are a subset of URIs, it is clear that matching specifically for URIs will successfully match undesired values. For example, URNs:

     "urn:isbn:0451450523" =~ URI::regexp
     => 0 
    

    That being said, as far as I know, Ruby doesn't have a default way to parse URLs , so you'll most likely need a gem to do so. If you need to match URLs specifically in HTTP or HTTPS format, you could do something like this:

    uri = URI.parse(my_possible_url)
    if uri.kind_of?(URI::HTTP) or uri.kind_of?(URI::HTTPS)
      # do your stuff
    end
    
    0 讨论(0)
  • 2020-11-28 04:34

    This is a little bit old but here is how I do it. Use Ruby's URI module to parse the URL. If it can be parsed then it's a valid URL. (But that doesn't mean accessible.)

    URI supports many schemes, plus you can add custom schemes yourself:

    irb> uri = URI.parse "http://hello.it" rescue nil
    => #<URI::HTTP:0x10755c50 URL:http://hello.it>
    
    irb> uri.instance_values
    => {"fragment"=>nil,
     "registry"=>nil,
     "scheme"=>"http",
     "query"=>nil,
     "port"=>80,
     "path"=>"",
     "host"=>"hello.it",
     "password"=>nil,
     "user"=>nil,
     "opaque"=>nil}
    
    irb> uri = URI.parse "http:||bra.ziz" rescue nil
    => nil
    
    
    irb> uri = URI.parse "ssh://hello.it:5888" rescue nil
    => #<URI::Generic:0x105fe938 URL:ssh://hello.it:5888>
    [26] pry(main)> uri.instance_values
    => {"fragment"=>nil,
     "registry"=>nil,
     "scheme"=>"ssh",
     "query"=>nil,
     "port"=>5888,
     "path"=>"",
     "host"=>"hello.it",
     "password"=>nil,
     "user"=>nil,
     "opaque"=>nil}
    

    See the documentation for more information about the URI module.

    0 讨论(0)
  • 2020-11-28 04:36

    Similar to the answers above, I find using this regex to be slightly more accurate:

    URI::DEFAULT_PARSER.regexp[:ABS_URI]
    

    That will invalidate URLs with spaces, as opposed to URI.regexp which allows spaces for some reason.

    I have recently found a shortcut that is provided for the different URI rgexps. You can access any of URI::DEFAULT_PARSER.regexp.keys directly from URI::#{key}.

    For example, the :ABS_URI regexp can be accessed from URI::ABS_URI.

    0 讨论(0)
  • 2020-11-28 04:36

    In general,

    /^#{URI::regexp}$/
    

    will work well, but if you only want to match http or https, you can pass those in as options to the method:

    /^#{URI::regexp(%w(http https))}$/
    

    That tends to work a little better, if you want to reject protocols like ftp://.

    0 讨论(0)
  • 2020-11-28 04:39

    For me, I use this regular expression:

    /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix
    

    Option:

    • i - case insensitive
    • x - ignore whitespace in regex

    You can set this method to check URL validation:

    def valid_url?(url)
      return false if url.include?("<script")
      url_regexp = /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix
      url =~ url_regexp ? true : false
    end
    

    To use it:

    valid_url?("http://stackoverflow.com/questions/1805761/check-if-url-is-valid-ruby")
    

    Testing with wrong URLs:

    • http://ruby3arabi - result is invalid
    • http://http://ruby3arabi.com - result is invalid
    • http:// - result is invalid
    • http://test.com\n<script src=\"nasty.js\"> (Just simply check "<script")

    Test with correct URLs:

    • http://ruby3arabi.com - result is valid
    • http://www.ruby3arabi.com - result is valid
    • https://www.ruby3arabi.com - result is valid
    • https://www.ruby3arabi.com/article/1 - result is valid
    • https://www.ruby3arabi.com/websites/58e212ff6d275e4bf9000000?locale=en - result is valid
    0 讨论(0)
  • 2020-11-28 04:44

    I prefer the Addressable gem. I have found that it handles URLs more intelligently.

    require 'addressable/uri'
    
    SCHEMES = %w(http https)
    
    def valid_url?(url)
      parsed = Addressable::URI.parse(url) or return false
      SCHEMES.include?(parsed.scheme)
    rescue Addressable::URI::InvalidURIError
      false
    end
    
    0 讨论(0)
提交回复
热议问题