memory location in unicode strings

前端 未结 2 1271
暗喜
暗喜 2021-01-19 01:11

I know someone explain why when I create equal unicode strings in Python 2.7 they do not point to the same location in memory As in \"normal\" strings

>&g         


        
相关标签:
2条回答
  • 2021-01-19 01:26

    I think regular strings are interned but unicode strings are not. This simple test seems to support my theory (Python 2.6.6):

    >>> intern("string")
    'string'
    >>> intern(u"unicode string")
    
    Traceback (most recent call last):
      File "<pyshell#18>", line 1, in <module>
        intern(u"unicode string")
    TypeError: intern() argument 1 must be string, not unicode
    
    0 讨论(0)
  • 2021-01-19 01:32

    Normal strings are not guaranteed to be interned. Sometimes they are, sometimes they aren't. The rules are complicated, version-specific, and intentionally not documented.

    You can depend on the fact that Python tries to intern small-ish, commonly-used objects whenever it's a good idea. And that, if you write any code that depends on either a1 is a2 or the converse, it will break whenever it's most inconvenient.

    If you want any more than this, you have to look at the source for whichever version of whichever implementation you're interested in. For CPython, the details are mostly inside stringobject.c for 2.6 and 2.7, unicodeobject.c for 3.3.

    The latter file of course also exists in 2.x (where it still defines the unicode type, that's just not the same as the str type as in 3.x). You can see from the 2.7 source that there is some support for interning unicode strings, even if you can't call intern on them. From a quick glance, it looks like 2.7 can handle interned unicode strings, but won't ever create them.

    Meanwhile, 3.3 makes things even more fun, as a str object can point at UTF-8, UTF-16, or UTF-32 storage, which might be interned, but code that uses the old-style Unicode APIs may still end up with a new copy. So, even if a1 is a2, if you try to get at their characters, they may have different buffers.

    When does python choose to intern a string has some more insight into the details. But again, the source is all that matters.

    0 讨论(0)
提交回复
热议问题