Can HTTP URIs have non-ASCII characters?

限于喜欢 提交于 2019-11-29 03:07:38

No, they are not allowed. Just check the ABNF in RFC 3986.

Here is an example: ☃.net.

In terms of the relevant section of RFC 3986, I think you are looking at 2.5.

EDIT:

Apparently stack overflow doesn't detect this as a proper URL. You'll have to copy&paste into your browser.

Used to be that non english characters were not allowed in DNS and URL/URI. There was a hack to allow them by using % encoding in URI. However many countries such us russia and china are starting to implement DNS using non latin characters. Here is a reference to one of these standards

RFC 3986 is being replaced with RFC 3987, which fully supports Unicode, and provides mappings rules to/from RFC 3986 style URIs.

Many browsers are not support URIs with Unicode characters (I've implemented them on a website I've build called -- blogvani.com) and Google duly scans and keeps them intact. I don't think that works on top-level domains though, at least not with the registrar and not directly.

For top-level domains if you have a domain registered in Unicode (for example people can register domains in Hindi), it will be converted to a corresponding code in ASCII (something that may go like jdhfks3243-32434.com)...

It is quite funny to see how this is routed and to realize that you're not actually going to a unicode domain even though it seems like that.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!