How can you make python 2.x warn when coercing strings to unicode?

前端 未结 2 350
长发绾君心
长发绾君心 2021-01-19 10:40

A very common source of encoding errors is that python 2 will silently coerce strings to unicode when you add them together with unicode. This can

相关标签:
2条回答
  • 2021-01-19 11:00

    I did a little more research after asking this question and hit on the perfect answer. Armin Ronacher created a wonderful little tool called unicode-nazi. Just install it and run your program like this:

    python -Werror -municodenazi myprog.py
    

    and you get a traceback right where the coercion happened:

    Traceback (most recent call last):
      File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "SITE-PACKAGES/unicodenazi.py", line 128, in <module>
        main()
      File "SITE-PACKAGES/unicodenazi.py", line 119, in main
        execfile(sys.argv[0], main_mod.__dict__)
      File "myprog.py", line 4, in <module>
        print foo()
      File "myprog.py", line 2, in foo
        return 'bar' + u'baz'
      File "SITE-PACKAGES/unicodenazi.py", line 34, in warning_decode
        stacklevel=2)
    UnicodeWarning: Implicit conversion of str to unicode
    

    If you're dealing with python libraries that trigger implicit coercions themselves and you can't catch the exceptions or otherwise work around them, you can leave out the -Werror:

    python -municodenazi myprog.py
    

    and at least see a warning printed out on stderr when it happens:

    /SITE-PACKAGES/unicodenazi.py:119: UnicodeWarning: Implicit conversion of str to unicode
      execfile(sys.argv[0], main_mod.__dict__)
    barbaz
    
    0 讨论(0)
  • 2021-01-19 11:15

    That error isn't cryptic at all. I can gather from it that urllib.quote() (with is called by quote_plus()) doesn't handle unicode very well. Some quick googling and I've found this previous SO question asking for unicode safe alternatives. Unfortunately, none seem to exist.

    0 讨论(0)
提交回复
热议问题