Iterating through PyMongo cursor throws InvalidBSON: year is out of range

给你一囗甜甜゛ 提交于 2019-12-22 20:34:42

问题


I am using PyMongo to simply iterate over a Mongo collection, but I'm struggling with handling large Mongodb date objects.

For example, if I have some data in a collection that looks like this:

"bad_data" : [ 
            {
                "id" : "id01",
                "label" : "bad_data",
                "value" : "exist",
                "type" : "String",
                "lastModified" : ISODate("2018-06-01T10:04:35.000Z"),
                "expires" : Date(9223372036854775000)
            }
        ]

I will do something like:

from pymongo import MongoClient, database, cursor, collection
client = MongoClient('localhost')
db = client['db1']
db.authenticate('user', 'pass', source='admin')
collection = db['collection']
for i in collection:
    # do something with i

and get the error InvalidBSON: year 292278994 is out of range

Is there any way I can handle dealing with this rediculous Date() object without bson falling over? I realise that having such a date in Mongodb is crazy but there is nothing I can do about this as it's not my data.


回答1:


There actually is a section in the PyMongo FAQ about this very topic:

Why do I get OverflowError decoding dates stored by another language’s driver?

PyMongo decodes BSON datetime values to instances of Python’s datetime.datetime. Instances of datetime.datetime are limited to years between datetime.MINYEAR (usually 1) and datetime.MAXYEAR (usually 9999). Some MongoDB drivers (e.g. the PHP driver) can store BSON datetimes with year values far outside those supported by datetime.datetime.

So the basic constraint here is on the datetime.datetime type as implemented for the mapping from BSON by the driver, and though it might be "ridiculous" it's valid for other languages to create such a date value.

As pointed to in the FAQ your general workarounds are:

  1. Deal with the offending BSON Date. Whilst valid to store, it possibly was not the "true" intention of whomever/whatever stored it in the first place.

  2. Add a "date range" condition to your code to filter "out of range" dates:

    result = db['collection'].find({ 
      'expires': { '$gte': datetime.min, '$lte': datetime.max }
    })
    for i in result:
      # do something with i
    
  3. Omit the offending date field in projection if you don't need the data in further processing:

    result = db['collection'].find({  }, projection={ 'expires': False })
    for i in result:
      # do something with i
    

Certainly 'expires' as a name suggests the original intent of the value was a date so far into the future that it was never going to come about, with the original author of that data ( and very possibly current code still writing it ) not being aware of the "Python" date constraint. So it's probably quite safe to "lower" that number in all documents and where any code is still writing it.



来源:https://stackoverflow.com/questions/53177590/iterating-through-pymongo-cursor-throws-invalidbson-year-is-out-of-range

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!