What NoSQL DB to use for sparse Time Series like data?

后端 未结 2 974
难免孤独
难免孤独 2021-01-31 22:07

I\'m planning a side project where I will be dealing with Time Series like data and would like to give one of those shiny new NoSQL DBs a try and am looking for a recommendation

相关标签:
2条回答
  • 2021-01-31 22:42

    Have a look at opentsdb.org an opensource time series database which use hbase. They have been smart on how they store the TS. It is well documented here: http://opentsdb.net/misc/opentsdb-hbasecon.pdf

    0 讨论(0)
  • 2021-01-31 22:47

    I believe literally all the major NoSQL databases will support that requirement, especially if you don't actually have a large volume of data (which begs the question, why NoSQL?).

    That said, I've had to recently design and work with a NoSQL database for time series data so can give some input on that design, which can then be extrapolated for all others.

    Our chosen database was Cassandra, and our design was as follows:

    • A single keyspace for all 'symbols'
    • Each symbol was a new row
    • Each time entry was a new column for that relevant row
    • Each value (can be more than a single value) was the value part of the time entry

    This lets you achieve everything you asked for, most notably to read the data for a single symbol, and using a range if necessary (column range calls). Although you said performance wasn't critical, it was for us and this was quite performant also - all data for any single symbol is by definition sorted (column name sort) and always stored on the same node (no cross node communication for simple queries). Finally, this design translates well to other NoSQL databases that have have dynamic columns.

    Further to this, here's some information on using MongoDB (and capped collections if necessary) for a time series store: MongoDB as a Time Series Database

    Finally, here's a discussion of SQL vs NoSQL for time series: https://dba.stackexchange.com/questions/7634/timeseries-sql-or-nosql

    I can add to that discussion the following:

    • Learning curve for NoSQL will be higher, you don't get the added flexibility and functionality for free in terms of 'soft costs'. Who will be supporting this database operationally?
    • If you expect this functionality to grow in future (either as more fields to be added to each time entry, or much larger capacity in terms of number of symbols or size of symbol's time series), then definitely go with NoSQL. The flexibility benefit is huge, and the scalability you get (with the above design) on both the 'per symbol' and 'number of symbols' basis is almost unbounded (I say almost unbounded - maximum columns per row is in the billions, maximum rows per key space is unbounded I believe).
    0 讨论(0)
提交回复
热议问题