Which characters make a URL invalid?
Are these valid URLs?
example.com/file[/].html
http://example.com/file[/].html
<
I am implementing old http (0.9, 1.0, 1.1) request and response reader/writer. Request URI is the most problematic place.
You can't just use RFC 1738, 2396 or 3986 as it is. There are many old HTTP clients and servers that allows more characters. So I've made research based on accidentally published webserver access logs: "GET URI HTTP/1.0" 200
.
I've found that the following non-standard characters are often used in URI:
\ { } < > | ` ^ "
These characters were described in RFC 1738 as unsafe.
If you want to be compatible with all old HTTP clients and servers - you have to allow these characters in request URI.
Please read more information about this research in oghttp-request-collector.