What regex can I use to get the domain name from a url in Ruby?

后端 未结 4 1046
醉梦人生
醉梦人生 2021-01-12 21:07

I am trying to construct a regex to extract a domain given a url.

for:

http://www.abc.google.com/
http://abc.google.com/
https://www.abc.google.com/
         


        
相关标签:
4条回答
  • 2021-01-12 21:18

    Don't know much about ruby but this regex pattern gives you the last 3 parts of the url excluding the trailing slash with a minumum of 2 characters per part.

    ([\w-]{2,}\.[\w-]{2,}\.[\w-]{2,})/$
    
    0 讨论(0)
  • 2021-01-12 21:27

    you may be able to use the domain_name gem for this kind of work. From the README:

    require "domain_name"
    host = DomainName("a.b.example.co.uk")
    host.domain         #=> "example.co.uk"
    
    0 讨论(0)
  • 2021-01-12 21:30

    Your question is a little bit vague. Can you give a precise specification of what it is exactly that you want to do? (Preferable with a testsuite.) Right now, all your question says is that you want a method that always returns 'abc.google.com'. That's easy:

    def extract_domain
      return 'abc.google.com'
    end
    

    But that's probably not what you meant …

    Also, you say that you need a Regexp. Why? What's wrong with, for example, using the URI class? After all, parsing and manipulating URIs is exactly what it was made for!

    require 'uri'
    
    URI.parse('https://abc.google.com/').host # => 'abc.google.com'
    

    And lastly, you say you are "trying to extract a domain", but you never specify what you mean by "domain". It looks you are sometimes meaning the FQDN and sometimes randomly dropping parts of the FQDN, but according to what rules? For example, for the FQDN abc.google.com, the domain name is google.com and the host name is abc, but you want it to return abc.google.com which is not just the domain name but the full FQDN. Why?

    0 讨论(0)
  • 2021-01-12 21:40
    URI.parse('http://www.abc.google.com/').host
    #=> "www.abc.google.com"
    

    Not a regex, but probably more robust then anything we come up with here.

    URI.parse('http://www.abc.google.com/').host.gsub(/^www\./, '')
    

    If you want to remove the www. as well this will work without raising any errors if the www. is not there.

    0 讨论(0)
提交回复
热议问题