Check for a valid domain name in a string?

别等时光非礼了梦想. 提交于 2019-11-27 03:31:11

问题


I am using python and would like a simple api or regex to check for a domain name's validity. By validity I am the syntactical validity and not whether the domain name actually exists on the Internet or not.


回答1:


Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).

So:

r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'

would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?




回答2:


r'^(?=.{4,255}$)([a-zA-Z0-9][a-zA-Z0-9-]{,61}[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,5}$'
  • Lookahead makes sure that it has a minimum of 4 (a.in) and a maximum of 255 characters
  • One or more labels (separated by periods) of length between 1 to 63, starting and ending with alphanumeric characters, and containing alphanumeric chars and hyphens in the middle.
  • Followed by a top level domain name (whose max length is 5 for museum)



回答3:


Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):

from socket import getaddrinfo

result = getaddrinfo("www.google.com", None)
print result[0][4]

Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.

The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.




回答4:


I've been using this:

(r'(\.|\/)(([A-Za-z\d]+|[A-Za-z\d][-])+[A-Za-z\d]+){1,63}\.([A-Za-z]{2,3}\.[A-Za-z]{2}|[A-Za-z]{2,6})')

to ensure it follows either after dot (www.) or / (http://) and the dash occurs only inside the name and to match suffixes such as gov.uk too.




回答5:


The answers are all pretty outdated with the spec at this point. I believe the below will match the current spec correctly:

r'^(?=.{1,253}$)(?!.*\.\..*)(?!\..*)([a-zA-Z0-9-]{,63}\.){,127}[a-zA-Z0-9-]{1,63}$'


来源:https://stackoverflow.com/questions/2894902/check-for-a-valid-domain-name-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!