I know many people encountered this error before but I couldn't find the solution to my problem.
I have a URL that I want to normalize:
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5" scheme, host_port, path, query, fragment = urlsplit(url) path = urllib.unquote(path) path = urllib.quote(path,safe="%/")
This gives an error message:
/usr/lib64/python2.6/urllib.py:1236: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal res = map(safe_map.__getitem__, s) Traceback (most recent call last): File "url_normalization.py", line 246, in <module> logging.info(get_canonical_url(url)) File "url_normalization.py", line 102, in get_canonical_url path = urllib.quote(path,safe="%/") File "/usr/lib64/python2.6/urllib.py", line 1236, in quote res = map(safe_map.__getitem__, s) KeyError: u'\xc3'
I tried to remove the unicode indicator "u" from the URL string and I do not get the error message. But How can I get rid of the unicode automatically because I read it directly from a database.