问题
I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.
I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.
If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.
I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.
Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.
Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.
I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.
Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out
to avoid any issues with the 16Mb document size for a single results set.
Am I being overly ambitious here expection MongoDb to be able to:
- Store that much data in a collection
- Aggregate and output the results of refreshed data to drive business insights in a separate collection for consumption by services which provide discrete aspects of a customer's business
Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.
Thanks in advance
回答1:
There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.
MongoDB Docs: Sharding Operational Restrictions
For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.
来源:https://stackoverflow.com/questions/38840669/mongodb-aggregation-performance-capability