问题
I've seen questions similar to this but not really getting at what I'm looking for so I was wondering. I'm trying to extract the main domain of a server from its URL, but just that, without any subdomains. So if the URL was, for example, "http://forums.example.com/" I want to know how to extract just the "example.com" portion from it. I've tried splitting at the second-to-last dot but that brings trouble when dealing with URLs like "http://forums.example.co.uk/", as it extracts just the "co.uk" when I would want "example.co.uk". Is there a way I can parse URLs this way without having to find a list of TLDs to compare?
PS: In case it matters, I will be using this in the context of mail servers, so the URLs will likely look more like "mail.example.co.uk" or "message-ID@user.mail.example.co.uk"
Edit: Okay so I know that the answer to this question is the same as one of the answers in the "duplicate" question but I believe it is different because the questions are different. In the other question the asker was asking regardless of subdomains and so the selected answer used urlparse, which doesn't distinguish subdomain from domain. In addition this question asks about email addresses as well, and urlparse doesn't work on email addresses (throws invalid url exception). So I believe this question is distinct from the other and not a duplicate
回答1:
You want to check out tldextract. With it you can do everything you want easily. For example:
>>> import tldextract
>>> extracted_domain = tldextract.extract('forums.example.com')
ExtractResult(subdomain='forums', domain='example', suffix='com')
Then you can just:
>>> domain = "{}.{}".format(extracted_domain.domain, extracted_domain.suffix)
>>> domain
'example.com'
It also works with emails:
>>> tldextract.extract('message-ID@user.mail.example.co.uk')
ExtractResult(subdomain='user.mail', domain='example', suffix='co.uk')
Just use pip to install it: pip install tldextract
来源:https://stackoverflow.com/questions/45022331/get-just-domain-name-from-url-in-python