问题
When profiling our code I was surprised to find millions of calls to
C:\Python26\lib\encodings\utf_8.py:15(decode)
I started debugging and found that across our code base there are many small bugs, usually comparing a string to a unicode or adding a sting and a unicode. Python graciously decodes the strings and performs the following operations in unicode.
How kind. But expensive!
I am fluent in unicode, having read Joel Spolsky and Dive Into Python...
I try to keep our code internals in unicode only.
My question - can I turn off this pythonic nice-guy behavior? At least until I find all these bugs and fix them (usually by adding a u'u')?
Some of them are extremely hard to find (a variable that is sometimes a string...).
Python 2.6.5 (and I can't switch to 3.x).
回答1:
The following should work:
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('undefined')
>>> u"abc" + u"xyz"
u'abcxyz'
>>> u"abc" + "xyz"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/encodings/undefined.py", line 22, in decode
raise UnicodeError("undefined encoding")
UnicodeError: undefined encoding
reload(sys)
in the snippet above is only necessary here since normally sys.setdefaultencoding
is supposed to go in a sitecustomize.py
file in your Python site-packages
directory (it's advisable to do that).
来源:https://stackoverflow.com/questions/2851481/can-i-turn-off-implicit-python-unicode-conversions-to-find-my-mixed-strings-bugs