Which characters make a URL invalid?
Are these valid URLs?
example.com/file[/].html
http://example.com/file[/].html
<
I need to select character to split urls in string, so I decided to create list of characters which could not be found in URL by myself:
>>> allowed = "-_.~!*'();:@&=+$,/?%#[]?@ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
>>> from string import printable
>>> ''.join(set(printable).difference(set(allowed)))
'`" <\x0b\n\r\x0c\\\t{^}|>'
So, the possible choices are the newline, tab, space, backslash and "<>{}^|
. I guess I'll go with the space or newline. :)