问题
I am using Python 2.7.3. Can anybody explain the difference between the literals:
'\u0391'
and:
u'\u0391'
and the different way they are echoed in the REPL below (especially the extra slash added to a1):
>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>>
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>>
回答1:
You can only use unicode escapes (\uabcd
) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal (u'some text'
) is a different type of Python object from a python byte string ('some text'
).
It's like using \t
versus \T
; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters).
To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO; I can also recommend the Joel Spolsky on Unicode article.
Note: in Python 3, the same differences apply, but 'some text'
is a Unicode string literal, and b'some text'
is the bytestring syntax.
回答2:
As opposed to C, in Python a string can be enclosed in simple quotes ('
) as well as double quotes ("
) -- leaving aside the triple-double quotes """
.
Thus, '\u0391'
is only a string containing the letters \
, u
, 0
, 3
, 9
and 1
. When pretty printing this string, the \
is escaped via another \
.
On the contrary, having a u
in front makes the string to be considered Unicode and all escapes are evaluated. Thus, u'\u0391'
is interpreted as "the Unicode string containing codepoint 0391
" which is different from the above.
来源:https://stackoverflow.com/questions/14559444/python-unicode-string-literals-whats-the-difference-between-u0391-and-u