MongoDB aggregation performance capability

问题

I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.

I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.

If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.

I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.

Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.

Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.

I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.

Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out to avoid any issues with the 16Mb document size for a single results set.

Am I being overly ambitious here expection MongoDb to be able to:

Store that much data in a collection
Aggregate and output the results of refreshed data to drive business insights in a separate collection for consumption by services which provide discrete aspects of a customer's business

Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.

Thanks in advance

回答1:

There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.

MongoDB Docs: Sharding Operational Restrictions

For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.

来源：https://stackoverflow.com/questions/38840669/mongodb-aggregation-performance-capability

标签

mongodb

aggregation-framework

mongodb-aggregation