Simple ascii url encoding with python

后端 未结 6 1269
不思量自难忘°
不思量自难忘° 2021-01-16 12:23

look at that:

import urllib
print urllib.urlencode(dict(bla=\'Ã\'))

the output is

bla=%C3%BC

what I want

相关标签:
6条回答
  • 2021-01-16 12:49

    I want the output in ascii instead of utf-8

    That's not ASCII, which has no characters mapped above 0x80. You're talking about ISO-8859-1, or possibly code page 1252 (the Windows encoding based on it).

    'Ã'.decode('iso-8859-1')
    

    Well that depends on what encoding you've used to save the character à in the source, doesn't it? It sounds like your text editor has saved it as UTF-8. (That's a good thing, because locale-specific encodings like ISO-8859-1 need to go away ASAP.)

    Tell Python that the source file you've saved is in UTF-8 as per PEP 263:

    # coding=utf-8
    
    urllib.quote(u'Ã'.encode('iso-8859-1'))    # -> %C3
    

    Or, if you don't want that hassle, use a backslash escape:

    urllib.quote(u'\u00C3'.encode('iso-8859-1'))    # -> %C3
    

    Although, either way, a modern webapp should be using UTF-8 for its input rather than ISO-8859-1/cp1252.

    0 讨论(0)
  • 2021-01-16 12:49

    If your input is actually UTF-8 and you want iso-8859-1 as output (which is not ASCII) what you need is:

    'ñ'.decode('utf-8').encode('iso-8859-1')
    
    0 讨论(0)
  • 2021-01-16 12:52

    thanks to all solutions. all of you converge to the very same point. I made a mess changing the right code

    .encode('iso-8859-1') 
    

    to

    .decode('iso-8859-1')
    

    turn back to .encode('iso-8859-1') and it works.

    0 讨论(0)
  • 2021-01-16 13:00

    Have a look at unicode transliteration in python:

    from unidecode import unidecode
    print unidecode(u"\u5317\u4EB0")
    
    # That prints: Bei Jing
    

    In your case:

    bla='Ã'
    print unidecode(bla)
    'A'
    

    This is a third party library, which can be easily installed via:

    $ git clone http://code.zemanta.com/tsolc/git/unidecode
    $ cd unidecode
    $ python setup.py install
    
    0 讨论(0)
  • 2021-01-16 13:01

    Package unihandecode is

    US-ASCII transliterations of Unicode text.
    an improved version of Python unidecode, that is Python port of Text::Unidecode Perl module by Sean M. Burke .

    pip install Unihandecode
    

    then in python

    import unihandecode
    print(unihandecode.unidecode(u'Ã'))
    

    prints A.

    0 讨论(0)
  • 2021-01-16 13:03

    pretty well working asciification is this way:

    import unicodedata
    unicodedata.normalize('NFKD', 'Ã'.decode('UTF-8')).encode('ascii', 'ignore')
    
    0 讨论(0)
提交回复
热议问题