pymongo error: bson.errors.InvalidBSON: 'utf8' codec can't decode byte 0xa1 in position 25: invalid start byte

一笑奈何 提交于 2020-01-02 05:19:31

问题


tasks = list(self.collection.find().sort('_id',pymongo.DESCENDING).limit(1000))

I had a trouble when i use pymongo to solve a program.

File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\cursor.py", line 1097, in next File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\cursor.py", line 1039, in _refresh File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\cursor.py", line 903, in __send_message File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\helpers.py", line 133, in _unpack_response bson.errors.InvalidBSON: 'utf8' codec can't decode byte 0xa1 in position 25: invalid start byte

tasks =self.collection.find().sort('_id',pymongo.DESCENDING).limit(1000)
for task in tasks:  #If i use this way,it will also touch this problem

task = self.collection.find_one()#It would do so,too

I step into pymongo to find out the reason.I find that the problem maybe cause by follow codes

    result = {"cursor_id": struct.unpack("<q", response[4:12])[0],
          "starting_from": struct.unpack("<i", response[12:16])[0],
          "number_returned": struct.unpack("<i", response[16:20])[0],
          "data": bson.decode_all(response[20:], codec_options)}

in pymongo helper.py 133 line in bson.decode_all it show the problem cause by the failed decode about 'oid','oid' is the _id in mongo.Then I copy the document and make a same document with a new _id,then i success get the document .

How can i solve the problem with the "for task in tasks:" style.

pymongo version used: 3.2.1


回答1:


You need to pass the unicode_decode_error_handler argument to MongoClient and use pymongo 3.5.1 at least.

import pymongo
import json
from pymongo import MongoClient

if __name__ == '__main__':

    client = MongoClient(
        host="whatever_your_host_is",
        maxPoolSize=50,
        unicode_decode_error_handler='ignore'
    )


    my_db=client['my_db']
    collection=my_db['my_collection']

    cursor = collection.find({"whatever": "some_stuff"})

    for document in cursor:
          print(document)

Looks like that 'ignore' is set by default on Python 2.7, but in Python 3.6.1 you have to do it yourself. This will ignore the Unicode errors and let the cursor continue iterating, pymongo will try to do its best to reconstruct the JSON data.




回答2:


I recently had a similar error message, and it is quite hard to find help about it.

Fast resolve

I solved my problem downgrading version of pymongo under 3.0. The changelog of pymongo advertise "A rewritten pure Python BSON implementation" at version 3.0. I found that the new implementation is having trouble managing python utf8 and unicode encoding when serializing in BSON format.

Analysing

It seams that the error comes from invalid bson from your DB ... similar to this. Maybe you should post your error there.




回答3:


I'm using Python 3.6, pymongo 3.4.0.

According to the documentation, you can clone a collection with the 'with_options' method, which does the trick for me:

 col_article = col_article.with_options(codec_options = bson.CodecOptions(unicode_decode_error_handler="ignore"))


来源:https://stackoverflow.com/questions/36314776/pymongo-error-bson-errors-invalidbson-utf8-codec-cant-decode-byte-0xa1-in-p

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!