Removing non-ascii characters from any given stringtype in Python

前端 未结 2 1936
清歌不尽
清歌不尽 2021-02-10 18:55
>>> teststring = \'aõ\'
>>> type(teststring)

>>> teststring
\'a\\xf5\'
>>> print teststring
aõ
>>> test         


        
相关标签:
2条回答
  • 2021-02-10 19:29

    It's simple: .encode converts Unicode objects into strings, and .decode converts strings into Unicode.

    0 讨论(0)
  • 2021-02-10 19:43

    Why did the decode("ascii") give out a unicode string?

    Because that's what decode is for: it decodes byte strings like your ASCII one into unicode.

    In your second example, you're trying to "decode" a string which is already unicode, which has no effect. To print it to your terminal, though, Python must encode it as your default encoding, which is ASCII - but because you haven't done that step explicitly and therefore haven't specified the 'ignore' parameter, it raises the error that it can't encode the non-ASCII characters.

    The trick to all of this is remembering that decode takes an encoded bytestring and converts it to Unicode, and encode does the reverse. It might be easier if you understand that Unicode is not an encoding.

    0 讨论(0)
提交回复
热议问题