Which characters make a URL invalid?

后端 未结 10 1271
小蘑菇
小蘑菇 2020-11-21 05:03

Which characters make a URL invalid?

Are these valid URLs?

  • example.com/file[/].html
  • http://example.com/file[/].html<
10条回答
  •  爱一瞬间的悲伤
    2020-11-21 05:38

    Several of Unicode character ranges are valid HTML5, although it might still not be a good idea to use them.

    E.g., href docs say http://www.w3.org/TR/html5/links.html#attr-hyperlink-href:

    The href attribute on a and area elements must have a value that is a valid URL potentially surrounded by spaces.

    Then the definition of "valid URL" points to http://url.spec.whatwg.org/, which says it aims to:

    Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process.

    That document defines URL code points as:

    ASCII alphanumeric, "!", "$", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", ":", ";", "=", "?", "@", "_", "~", and code points in the ranges U+00A0 to U+D7FF, U+E000 to U+FDCF, U+FDF0 to U+FFFD, U+10000 to U+1FFFD, U+20000 to U+2FFFD, U+30000 to U+3FFFD, U+40000 to U+4FFFD, U+50000 to U+5FFFD, U+60000 to U+6FFFD, U+70000 to U+7FFFD, U+80000 to U+8FFFD, U+90000 to U+9FFFD, U+A0000 to U+AFFFD, U+B0000 to U+BFFFD, U+C0000 to U+CFFFD, U+D0000 to U+DFFFD, U+E1000 to U+EFFFD, U+F0000 to U+FFFFD, U+100000 to U+10FFFD.

    The term "URL code points" is then used in the statement:

    If c is not a URL code point and not "%", parse error.

    in a several parts of the parsing algorithm, including the schema, authority, relative path, query and fragment states: so basically the entire URL.

    Also, the validator http://validator.w3.org/ passes for URLs like "你好", and does not pass for URLs with characters like spaces "a b"

    Of course, as mentioned by Stephen C, it is not just about characters but also about context: you have to understand the entire algorithm. But since class "URL code points" is used on key points of the algorithm, it that gives a good idea of what you can use or not.

    See also: Unicode characters in URLs

提交回复
热议问题