Comprehensive character replacement module in python for non-unicode and non-ascii for HTML

人走茶凉 提交于 2019-12-08 04:06:57

问题


Is there a comprehensive character replacement module for python that finds all non-ascii or non-unicode characters in a string and replaces them with ascii or unicode equivilents? This comfort with the "ignore" argument during encoding or decoding is insane, but likewise so is a '?' in every place that a non translated character was.

I'm looking for one module that finds irksome characters and conforms them to whatever standard is requested. I realize that the amount of extant alphabets and encodings makes this somewhat impossible, but surely someone has taken a stab at it? Even a rudimentary solution would be better than the status quo.

The simplification for data transfer that this would mean is enormous.


回答1:


i don't think what you want is really possible - but i think there is a decent option.

unicodedata has a 'normalize' method that can gracefully degrade text for you...

import unicodedata
def gracefully_degrade_to_ascii( text ):
    return unicodedata.normalize('NFKD',text).encode('ascii','ignore')

assuming the charset you're using is already mapped into unicode - or at least can be mapped into unicode - you should be able to degrade the unicode version of that text down to ascii or utf-8 with this module ( it's part of the standard library too )

Full Docs - http://docs.python.org/library/unicodedata.html




回答2:


To look at any individual character and guess its encoding would be hard and probably not very accurate. However, you can use chardet to try and detect the encoding of an entire file. Then you can use the string decode() and encode() methods to convert its encoding to UTF-8.

http://pypi.python.org/pypi/chardet

And UTF-8 is backwards compatible with ASCII so that won't be a big deal.



来源:https://stackoverflow.com/questions/12830047/comprehensive-character-replacement-module-in-python-for-non-unicode-and-non-asc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!