Get just domain name from URL in Python [duplicate]

走远了吗. 提交于 2019-12-12 03:27:29

问题


I've seen questions similar to this but not really getting at what I'm looking for so I was wondering. I'm trying to extract the main domain of a server from its URL, but just that, without any subdomains. So if the URL was, for example, "http://forums.example.com/" I want to know how to extract just the "example.com" portion from it. I've tried splitting at the second-to-last dot but that brings trouble when dealing with URLs like "http://forums.example.co.uk/", as it extracts just the "co.uk" when I would want "example.co.uk". Is there a way I can parse URLs this way without having to find a list of TLDs to compare?

PS: In case it matters, I will be using this in the context of mail servers, so the URLs will likely look more like "mail.example.co.uk" or "message-ID@user.mail.example.co.uk"

Edit: Okay so I know that the answer to this question is the same as one of the answers in the "duplicate" question but I believe it is different because the questions are different. In the other question the asker was asking regardless of subdomains and so the selected answer used urlparse, which doesn't distinguish subdomain from domain. In addition this question asks about email addresses as well, and urlparse doesn't work on email addresses (throws invalid url exception). So I believe this question is distinct from the other and not a duplicate


回答1:


You want to check out tldextract. With it you can do everything you want easily. For example:

>>> import tldextract
>>> extracted_domain = tldextract.extract('forums.example.com')
ExtractResult(subdomain='forums', domain='example', suffix='com')

Then you can just:

>>> domain = "{}.{}".format(extracted_domain.domain, extracted_domain.suffix)
>>> domain
'example.com'

It also works with emails:

>>> tldextract.extract('message-ID@user.mail.example.co.uk')
ExtractResult(subdomain='user.mail', domain='example', suffix='co.uk')

Just use pip to install it: pip install tldextract



来源:https://stackoverflow.com/questions/45022331/get-just-domain-name-from-url-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!