In sql terms we\'re storing data like this:
table events (
id
timestamp
dimension1
dimension2
dimension3
etc.
)
All dimension value
Was also looking at MongoDB, but their "group()" function has severe limitations as far as I could read (max of 10,000 rows).
To clarify, this is 10,000 rows returned. In your example, this will work for up to 10,000 combinations of dimension1/dimension2
. If that's too large, then you can also use the slower Map / Reduce. Note that if you're running a query with more than 10k results, it may best to use Map / Reduce and save this data. 10k is a large query result to otherwise just "throw away".
Do you have experience with any of these databases, and would you recommend it as a solution to the problem described above?
Many people actually use MongoDB to do this type of summary "real-time", but they do it using "counters" instead of "aggregation". Instead of "rolling-up" detailed data, they'll do a regular insert and then they'll increment some counters.
In particular, using the atomic modifiers like $inc
& $push
to atomically update data in a single request.
Take a look at hummingbird for someone doing this right now. There's also an open source event-logging system backed by MongoDB: Graylog2. ServerDensity also does server event logging backed by MongoDB.
Looking at these may give you some inspiration for the types of logging you want to do.