Removing non-ascii characters from any given stringtype in Python

不羁岁月 提交于 2019-12-03 08:25:52

It's simple: .encode converts Unicode objects into strings, and .decode converts strings into Unicode.

Why did the decode("ascii") give out a unicode string?

Because that's what decode is for: it decodes byte strings like your ASCII one into unicode.

In your second example, you're trying to "decode" a string which is already unicode, which has no effect. To print it to your terminal, though, Python must encode it as your default encoding, which is ASCII - but because you haven't done that step explicitly and therefore haven't specified the 'ignore' parameter, it raises the error that it can't encode the non-ASCII characters.

The trick to all of this is remembering that decode takes an encoded bytestring and converts it to Unicode, and encode does the reverse. It might be easier if you understand that Unicode is not an encoding.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!