Ridiculously slow mongoDB query on small collection in simple but big database

不问归期 提交于 2019-12-21 02:28:08

问题


So I have a super simple database in mongoDB with a few collections:

> show collections
Aggregates <-- count: 92
Users <-- count: 68222
Pages <-- count: 1728288847, about 1.1TB
system.indexes

The Aggregates collection is an aggregate of the Pages collection, and each document looks like this:

> db.Aggregates.findOne()
{
        "_id" : ObjectId("50f237126ba71610eab3aaa5"),
        "daily_total_pages" : 16929799,
        "day" : 21,
        "month" : 9,
        "year" : 2011
}

Very simple. However, let's try and get the total page loads by adding all 92 days daily page loads together:

>>> def get_total():
...     start = datetime.now()
...     print sum([x['daily_total_pages'] for x in c.Aggregates.find()])
...     end = datetime.now()
...     print (end-start).seconds
...
>>> get_total()
1728288847
43

43 seconds?!??!??!?!

Those 92 aggregate results are tiny! I might as well just store them in a text file, this is crazy.

Or are they tiny? How big are they on disk according to mongo?

> db.Aggregates.stats()
{
        "ns" : "c.AggregateResults",
        "count" : 92,
        "size" : 460250104,
        "avgObjSize" : 5002718.521739131,
        "storageSize" : 729464832,
        "numExtents" : 7,
        "nindexes" : 2,
        "lastExtentSize" : 355647488,
        "paddingFactor" : 1.0690000000000066,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 16352,
        "indexSizes" : {
                "_id_" : 8176,
                "date_1" : 8176
        },
        "ok" : 1
}

438 megabytes total for those tiny daily numbers? Each one is approximately 280 bytes, so they should be a maximum of 25~30kb total. So the storage is huge and the query is super slow. Is it possible it could be fragmented on disk? I created the aggregates after inserting the documents into the the full Pages collection.

Anyone have any insights into this madness? :O


Edit: Solved with more specific find() query by Jared. The video below that Sammaye provided also gives some very interesting storage insights.


Edit 2: So I found out that using sys.getsizeof() is a really unreliable method of finding out your document's sizes since it doesn't recurse down any trees. So actually my docs were quite large, and the best thing to do was to use find({}, {'daily_page_loads'}) as a more specific query!


回答1:


The avgObjSize is out of line with the 280 byte estimate. It's saying your objects are averaging around 5MB and storageSize is near 1GB. If you're memory-constrained running a query that needs to access all 1GB of file would cause lots of page faults.

Have you tried compacting?

db.runCommand({compact: 'Aggregates'})

or repairing?

db.repairDatabase()

If that doesn't work try pulling back just those fields needed for the sum rather than pulling the whole document. It may be that those documents are actually 5MB and the time is spent pulling data over the wire.

def get_total():
    start = datetime.now()
    print sum([x['daily_total_pages'] for x in c.Aggregates.find({}, {"daily_total_pages": 1})])
    end = datetime.now()
    print (end-start).seconds


来源:https://stackoverflow.com/questions/14385696/ridiculously-slow-mongodb-query-on-small-collection-in-simple-but-big-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!