问题
I am using python and would like a simple api or regex to check for a domain name's validity. By validity I am the syntactical validity and not whether the domain name actually exists on the Internet or not.
回答1:
Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).
So:
r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'
would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?
回答2:
r'^(?=.{4,255}$)([a-zA-Z0-9][a-zA-Z0-9-]{,61}[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,5}$'
- Lookahead makes sure that it has a minimum of 4 (
a.in
) and a maximum of 255 characters - One or more labels (separated by periods) of length between 1 to 63, starting and ending with alphanumeric characters, and containing alphanumeric chars and hyphens in the middle.
- Followed by a top level domain name (whose max length is 5 for museum)
回答3:
Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):
from socket import getaddrinfo
result = getaddrinfo("www.google.com", None)
print result[0][4]
Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.
The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.
回答4:
I've been using this:
(r'(\.|\/)(([A-Za-z\d]+|[A-Za-z\d][-])+[A-Za-z\d]+){1,63}\.([A-Za-z]{2,3}\.[A-Za-z]{2}|[A-Za-z]{2,6})')
to ensure it follows either after dot (www.) or / (http://) and the dash occurs only inside the name and to match suffixes such as gov.uk too.
回答5:
The answers are all pretty outdated with the spec at this point. I believe the below will match the current spec correctly:
r'^(?=.{1,253}$)(?!.*\.\..*)(?!\..*)([a-zA-Z0-9-]{,63}\.){,127}[a-zA-Z0-9-]{1,63}$'
来源:https://stackoverflow.com/questions/2894902/check-for-a-valid-domain-name-in-a-string