Characters allowed in a URL

前端未结

关注

 9  942

Does anyone know the full list of characters that can be used within a GET without being encoded? At the moment I am using A-Z a-z and 0-9... but I am looking to find out th

相关标签:

9条回答

灰色年华

2020-11-22 06:29

These are listed in RFC3986. See the Collected ABNF for URI to see what is allowed where and the regex for parsing/validation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
伪装坚强ぢ

2020-11-22 06:33
RFC3986 defines two sets of characters you can use in a URI:
- Reserved Characters: :/?#[]@!$&'()*+,;=
  
  reserved = gen-delims / sub-delims
  
  gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
  
  sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
  
  The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.
- Unreserved Characters: A-Za-z0-9-_.~
  
  unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  
  Characters that are allowed in a URI but do not have a reserved purpose are called unreserved.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-11-22 06:38
I tested it by requesting my website (apache) with all available chars on my german keyboard as URL parameter:
```
http://example.com/?^1234567890ß´qwertzuiopü+asdfghjklöä#<yxcvbnm,.-°!"§$%&/()=? `QWERTZUIOPÜ*ASDFGHJKLÖÄ\'>YXCVBNM;:_²³{[]}\|µ@€~
```
These were not encoded:
```
^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.-!/()=?`*;:_{}[]\|~
```
Not encoded after urlencode():
```
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_
```
Not encoded after rawurlencode():
```
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~
```
Note: Before PHP 5.3.0 rawurlencode() encoded ~ because of RFC 1738. But this was replaced by RFC 3986 so its safe to use, now. But I do not understand why for example {} are encoded through rawurlencode() because they are not mentioned in RFC 3986.

An additional test I made was regarding auto-linking in mail texts. I tested Mozilla Thunderbird, aol.com, outlook.com, gmail.com, gmx.de and yahoo.de and they fully linked URLs containing these chars:
```
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~+#,%&=*;:@
```
Of course the ? was linked, too, but only if it was used once.

Some people would now suggest to use only the rawurlencode() chars, but did you ever hear that someone had problems to open these websites?

Asterisk
http://wayback.archive.org/web/*/http://google.com

Colon
https://en.wikipedia.org/wiki/Wikipedia:About

Plus
https://plus.google.com/+google

At sign, Colon, Comma and Exclamation mark
https://www.google.com/maps/place/USA/@36.2218457,...

Because of that these chars should be usable unencoded without problems. Of course you should not use &; because of encoding sequences like &. The same reason is valid for % as it used to encode chars in general. And = as it assigns a value to a parameter name.

Finally I would say its ok to use these unencoded:
```
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-_~!+,*:@
```
But if you expect randomly generated URLs you should not use .!, because those mark the end of a sentence and some mail apps will not auto-link the last char of the url. Example:
```
Visit http://example.com/foo=bar! !
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2