Which database to choose (Cassandra, MongoDB, ?) for storing and querying event / log / metrics data?

后端 未结 3 2141
一整个雨季
一整个雨季 2021-02-14 09:29

In sql terms we\'re storing data like this:

table events (
  id
  timestamp
  dimension1
  dimension2
  dimension3
  etc.
)

All dimension value

3条回答
  •  醉话见心
    2021-02-14 09:59

    Was also looking at MongoDB, but their "group()" function has severe limitations as far as I could read (max of 10,000 rows).

    To clarify, this is 10,000 rows returned. In your example, this will work for up to 10,000 combinations of dimension1/dimension2. If that's too large, then you can also use the slower Map / Reduce. Note that if you're running a query with more than 10k results, it may best to use Map / Reduce and save this data. 10k is a large query result to otherwise just "throw away".

    Do you have experience with any of these databases, and would you recommend it as a solution to the problem described above?

    Many people actually use MongoDB to do this type of summary "real-time", but they do it using "counters" instead of "aggregation". Instead of "rolling-up" detailed data, they'll do a regular insert and then they'll increment some counters.

    In particular, using the atomic modifiers like $inc & $push to atomically update data in a single request.

    Take a look at hummingbird for someone doing this right now. There's also an open source event-logging system backed by MongoDB: Graylog2. ServerDensity also does server event logging backed by MongoDB.

    Looking at these may give you some inspiration for the types of logging you want to do.

提交回复
热议问题