I know many people encountered this error before but I couldn\'t find the solution to my problem.
I have a URL that I want to normalize:
url = u\"ht
urllib.quote()
does not properly parse Unicode. To get around this, you can call the .encode()
method on the url when reading it (or on the variable you read from the database). So run url = url.encode('utf-8')
. With this you get:
import urllib
import urlparse
from urlparse import urlsplit
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
url = url.encode('utf-8')
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
and then your output for the path
variable will be:
>>> path
'/Dienste/Fachbeitr%C3%A4ge.aspx'
Does this work?