问题
From the pymongo docs:
MongoDB stores data in BSON format. BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Regular strings () are > validated and stored unaltered. Unicode strings () are encoded UTF-8 first. > The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str."
It seems a bit silly to me that the database can only store UTF-8 encoded strings, but the return type in pymongo is unicode, meaning the first thing I have to do with every string from the document is once again call encode('utf-8') on it. Is there some way around this, i.e. telling pymongo not to give me unicode back but just give me the raw str?
回答1:
No, there is no such feature in PyMongo; every string decoded from BSON is decoded as UTF-8. Python represents the string internally as UCS-2 or some other format, depending on the Python version. See the code where the BSON decoder extracts a string.
In the upcoming PyMongo 3.x series we may add features for more flexible BSON decoding to allow developers to optimize uncommon use cases like this.
来源:https://stackoverflow.com/questions/18103497/how-can-i-get-pymongo-to-always-return-str-and-not-unicode