MongoDB InvalidDocument: Cannot encode object

后端 未结 4 1855
粉色の甜心
粉色の甜心 2021-02-12 15:59

I am using scrapy to scrap blogs and then store the data in mongodb. At first i got the InvalidDocument Exception. So obvious to me is that the data is not in the right encoding

相关标签:
4条回答
  • 2021-02-12 16:19

    Finally I figured it out. The problem was not with encoding. It was with the structure of the documents.

    Because i went off on the standard MongoPipeline example which does not deal with nested scrapy items.

    What i am doing is: BlogItem: "url" ... comments = [CommentItem]

    So my BlogItem has a list of CommentItems. Now the problem came here, for persisting the object in the database i do:

    self.db[self.collection_name].insert(dict(item))
    

    So here i am parsing the BlogItem to a dict. But i am not parsing the list of CommentItems. And because the traceback displays the CommentItem kind of like a dict, It did not occur to me that the problematic object is not a dict!

    So finally the the way to fix this problem is to change the line when appending the comment to the comment list as such:

    item['comments'].append(dict(comment))
    

    Now MongoDB considers it as a valid document.

    Lastly, for the last part where i ask why i am getting a exception on the python console and not in the script.

    The reason is because i was working on the python console, which only supports ascii. And thus the error.

    0 讨论(0)
  • 2021-02-12 16:21

    I ran into the same error using a numpy array in a Mongo query :

    'myField' : { '$in': myList },
    

    The fix was simply to convert the nd.array() into a list :

    'myField' : { '$in': list(myList) },
    
    0 讨论(0)
  • 2021-02-12 16:31

    I got this error when running a query

    db.collection.find({'attr': {'$gte': 20}})
    

    and some records in collection had a non-numeric value for attr.

    0 讨论(0)
  • 2021-02-12 16:37

    First, when you do "somestring".encode(...), isn't changing "somestring", but it returns a new encoded string, so you should use something like:

     item['author'] = item['author'].encode('utf-8', 'strict')
    

    and the same for the other fields.

    0 讨论(0)
提交回复
热议问题