Remove all hex characters from string in Python

后端 未结 4 1102
失恋的感觉
失恋的感觉 2021-02-04 11:18

Although there are similar questions, I can\'t seem to find a working solution for my case:

I\'m encountering some annoying hex chars in strings, e.g.

\'         


        
4条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-04 11:49

    These are not "hex characters" but the internal representation (utf-8 encoded in the first case, unicode code point in the second case) of the unicode characters 'LEFT DOUBLE QUOTATION MARK' ('“') and 'RIGHT DOUBLE QUOTATION MARK' ('”').

    >>> s = "\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah"
    >>> print s
    “http://www.google.com” blah blah#%#@$^blah
    >>> s.decode("utf-8")
    u'\u201chttp://www.google.com\u201d blah blah#%#@$^blah'
    >>> print s.decode("utf-8")
    “http://www.google.com” blah blah#%#@$^blah
    

    As how to remove them, they are just ordinary characters so a simple str.replace() will do:

    >>> s.replace("\xe2\x80\x9c", "").replace("\xe2\x80\x9d", "")
    'http://www.google.com blah blah#%#@$^blah'
    

    If you want to get rid of all non-ascii characters at once, you just have to decode to unicode then encode to ascii with the "ignore" parameter:

    >>> s.decode("utf-8").encode("ascii", "ignore")
    'http://www.google.com blah blah#%#@$^blah'
    

提交回复
热议问题