Number in the top-level domain?

后端 未结 3 1081
旧时难觅i
旧时难觅i 2021-01-03 20:49

Can top-level domains contain a number at the end? Idk nothing about DNS rules etc but when I try to use PHP\'s filter_var() function with FILTER_VALIDATE_EMAIL for te

相关标签:
3条回答
  • 2021-01-03 21:10

    Does top-level domain can contain a number at the end?

    Yes technically, except if it is purely numerical, then it can not be a TLD, under current rules and for easy reasons to understand (to disambiguate with IP addresses). And it can not contain a number at the end, except if it is an IDN TLD, for reasons enforced by ICANN.

    Let us go back to some RFCs to have some clearer definitions of things:

    RFC 952: DOD INTERNET HOST TABLE SPECIFICATION (October 1985)

    This is the definition of an Internet "hostname" back then:

    A "name" (Net, Host, Gateway, or Domain name) is a text string up
    to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
    sign (-), and period (.). Note that periods are only allowed when
    they serve to delimit components of "domain style names". (See
    RFC-921, "Domain Name System Implementation Schedule", for
    background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period.

    Note that this also has the following:

    Single character names or nicknames are not allowed.

    Hence at that point:

    • com1 is a valid TLD
    • 3com is not ("The first character must be an alpha character.")
    • 42 is not (same reason)
    • 1 is not (same reason)
    • a is not ("Single character names or nicknames are not allowed.")

    RFC 1034: DOMAIN NAMES - CONCEPTS AND FACILITIES (November 1987)

    This is one of the RFC that created the DNS as we know today. For compatibility reasons it defined hostnames as a sequence of labels, where a label is defined as such:

    They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

    The TLD is one label among others (the L in TLD). Per the above rule, com1 is a valid label, and hence a valid TLD, where 3com would not have been. Which directly brings us to the following amendment.

    RFC 1123: Requirements for Internet Hosts -- Application and Support (October 1989)

    This amends the previous RFC by changing one rule:

    The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax.

    So at that point:

    • com1 is a valid TLD
    • 3com is also valid
    • 42 is valid
    • 1 is valid
    • a is valid

    For the case of "numerical" TLDs, the following rule in first document applies:

    Whenever a user inputs the identity of an Internet host, it SHOULD be possible to enter either (1) a host domain name or (2) an IP address in dotted-decimal ("#.#.#.#") form. The host SHOULD check the string syntactically for a dotted-decimal number before looking it up in the Domain Name System.

    and

    If a dotted-decimal number can be entered without such identifying delimiters, then a full syntactic check must be made, because a segment of a host domain name is now allowed to begin with a digit and could legally be entirely numeric (see Section 6.1.2.4). However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.

    RFC 1738: Uniform Resource Locators (URL) (December 1994)

    This also speaks about the TLD, but giving:

    The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.

    RFC 3696: Application Techniques for Checking and Transformation of Names (February 2004)

    This was needed to introduce IDNs (Internationalized Domain Names) and it has this to say:

    Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications. This preferred form has been the only one permitted in the names of top-level domains, or TLDs. In general, it is also the only form permitted in most second-level names registered in TLDs, although some names that are normally not seen by users obey other rules. It derives from the original ARPANET rules for the naming of hosts (i.e., the "hostname" rule) and is perhaps better described as the "LDH rule", after the characters that it permits. The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen. No other symbols or punctuation characters are permitted, nor is blank space. If the hyphen is used, it is not permitted to appear at either the beginning or end of a label. There is an additional rule that essentially requires that top-level domain names not be all- numeric.

    In fact as soon as IDNs are involved, and they are IDN TLDs (both ccTLDs and gTLDs now), the encoding chosen generates an ASCII string of the form xn--something where the something can have digits, including at the end, like shown in other answers.

    However it is not really clear from where the "additional rule" in the last sentence comes from.

    RFC 4697: Observed DNS Resolution Misbehavior (October 2006)

    Not defining anything, but providing some interesting facts:

    The root name servers receive a significant number of A record queries where the QNAME looks like an IPv4 address.

    and

    A possible solution is to delegate these numeric TLDs from the root zone to a separate set of servers to absorb the traffic.

    Which clearly shows that indeed, in the wild, there are applications, maybe by mistake but it shows at least that it works technically, sending queries for names that are indeed formatted like IPv4 addresses, so with a fully numerical "TLD".

    There was in fact an experience to launch a .42 registry, obviously completely outside of ICANN ecosystem. You can see a summary of it at http://www.dotsauce.com/experimental-numeric-tld-42-domain/ and an archive of their main explanations at https://web.archive.org/web/20101222151118/http://register.42registry.org:80/ (in French).

    It did not went far, even if it technically works.

    It showed for example that Microsoft based OS by default did not consider purely numeric TLDs at all, but they provided a patch for that: https://support.microsoft.com/en-us/help/947228/error-message-when-you-try-to-join-a-windows-vista-based-client-comput "When you try to join a Windows Vista-based client computer to a top level domain (TLD) that has a purely numeric suffix, the Windows Vista-based client computer cannot join the domain. [..] This behavior is by design."

    Internet-Draft draft-liman-tld-names-06: Top Level Domain Name Specification (November 2011)

    This finally gives some explanations on why purely numeric TLD or even TLD with one digit are sometimes considered invalid when it is not a clear consequence from above specifications:

    (section 2.1 below refers to content in RFC 1123, quoted above)

    In addition, the DISCUSSION section of Section 2.1 says:

     'However, a valid host name can never have the dotted-decimal form
     #.#.#.#, since at least the highest-level component label will be
     alphabetic.'  [Section 2.1]
    

    Some implementers may have understood the above phrase 'will be alphabetic' to be a protocol restriction.

    But it basically just recommend to go with the flow and continue the same restrictions:

    Neither [RFC0952] nor [RFC1123] explicitly states the reasons for these restrictions. It might be supposed that human factors were a consideration; [RFC1123] appears to suggest that one of the reasons was to prevent confusion between dotted-decimal IPv4 addresses and host domain names. In any case, it is reasonable to believe that the restrictions have been assumed in some deployed software, and that changes to the rules should be undertaken with caution.

    Hence it offered this definition:

    traditional-tld-label = 1*63(ALPHA)

    This draft never converted to an RFC because not everyone agreed with it. You can find a thread with dissenting voices for it at https://www.ietf.org/mail-archive/web/dnsop/current/msg08866.html ; basically it was not clear if there was a restriction in the past that we are now trying to relax a little or if there never was a restriction to begin with and that people implemented systems wrongly.

    For example you can see about this Chromium/Chrome bugreport: https://bugs.chromium.org/p/chromium/issues/detail?id=31405 Browsing failed if using a TLD starting with a digit or purely numeric (it worked if it ended with a digit with letters before). This was not considered as a bug, and is not fixed, because the browser ships with a list of TLDs so it can know which ones are valid which are not, besides testing their syntax.

    ICANN Application Guidebook for new TLDs (June 2012)

    Available at https://newgtlds.icann.org/en/applicants/agb/guidebook-full-04jun12-en.pdf it says the following starting at page 64:

    The ASCII label (i.e., the label as transmitted on the wire) must be valid as specified in technical standards Domain Names: Implementation and Specification (RFC 1035), and Clarifications to the DNS Specification (RFC 2181) and any updates thereto.

    The ASCII label must be a valid host name, as specified in the technical standards DOD Internet Host Table Specification (RFC 952), Requirements for Internet Hosts — Application and Support (RFC 1123), and Application Techniques for Checking and Transformation of Names (RFC 3696), Internationalized Domain Names in Applications (IDNA)(RFCs 5890-5894), and any updates thereto. This includes the following:

    The ASCII label must consist entirely of letters (alphabetic characters a-z), or

    The label must be a valid IDNA A-label (further restricted as described in Part II below).

    Specially note the: The ASCII label must consist entirely of letters (alphabetic characters a-z)

    This immediately forbids any full numerical, as well as in fact any digit, including at end, except for IDN TLDs, the one with the form xn--something.

    Note that someone asked directly ICANN about this, and got the following reply, shown at https://domaingang.com/domain-news/icann-applicant-handbook-this-is-why-we-cannot-have-numeric-gtlds/ :

    Please note Numeric TLD’s were prohibited in the first round of applications. The prohibition on numeric gTLDs in the applicant guidebook (http://newgtlds.icann.org/en/applicants/agb) derives from a number of technical concerns regarding the ability of such domains to operate properly. Domain names are often used in place where other kinds of identifiers may be used like IP addresses.

    The fact that a TLD is all alphabetic is often a key determinant for software in identifying a domain name. If a TLD such as “.123” were allowed, you could have a domain name of “74.125.244.123” which would be difficult to discriminate from an IP address “74.125.244.123.”. There are also other considerations: some technical standards documentation states that TLDs will be alphabetical, which has been codified as an assumption in software also.

    The limitation in the AGB to alphabetic characters was designed to limit these scenarios that means such TLDs are not likely to work well in software, as well as limit potential security issues that may result from the same issues.

    0 讨论(0)
  • 2021-01-03 21:19

    Conceptually, there is nothing that disallows numbers in a TLD and in the future, who knows, perhaps there will be numeric TLDs.

    There are no TLDs at the moment that do have numbers in them - the function probably does not test against a list of known TLDs (as it is subject to change), but lexically.

    0 讨论(0)
  • 2021-01-03 21:22

    Actually there are quite a few TLDs currently in use that contain numbers:

    XN--1QQW23A
    XN--3BST00M
    XN--3DS443G
    XN--3E0B707E
    XN--45BRJ9C
    XN--4GBRIM
    XN--55QW42G
    XN--55QX5D
    XN--6FRZ82G
    XN--6QQ986B3XL
    XN--80ADXHKS
    XN--80AO21A
    XN--80ASEHDB
    XN--80ASWG
    XN--90A3AC
    XN--C1AVG
    XN--CG4BKI
    XN--CLCHC0EA0B2G2A9GCD
    XN--CZR694B
    XN--CZRU2D
    XN--D1ACJ3B
    XN--FIQ228C5HS
    XN--FIQ64B
    XN--FIQS8S
    XN--FIQZ9S
    XN--FPCRJ9C3D
    XN--FZC2C9E2C
    XN--GECRJ9C
    XN--H2BRJ9C
    XN--I1B6B1A6A2E
    XN--IO0A7I
    XN--J1AMH
    XN--J6W193G
    XN--KPRW13D
    XN--KPRY57D
    XN--KPUT3I
    XN--L1ACC
    XN--LGBBAT1AD8J
    XN--MGB9AWBF
    XN--MGBA3A4F16A
    XN--MGBAAM7A8H
    XN--MGBAB2BD
    XN--MGBAYH7GPA
    XN--MGBBH1A71E
    XN--MGBC0A9AZCG
    XN--MGBERP4A5D4AR
    XN--MGBX4CD0AB
    XN--NGBC5AZD
    XN--NQV7F
    XN--NQV7FS00EMA
    XN--O3CW4H
    XN--OGBPF8FL
    XN--P1AI
    XN--PGBS0DH
    XN--Q9JYB4C
    XN--RHQV96G
    XN--S9BRJ9C
    XN--SES554G
    XN--UNUP4Y
    XN--VHQUV
    XN--WGBH1C
    XN--WGBL6A
    XN--XHQ521B
    XN--XKC2AL3HYE2A
    XN--XKC2DL3A5EE0H
    XN--YFRO4I67O
    XN--YGBI2AMMX
    XN--ZFR164B
    

    You can see an up to date list here data.iana.org/TLD/tlds-alpha-by-domain.txt or a list with descriptions here swcs.com.au/tld.htm

    0 讨论(0)
提交回复
热议问题