Storing time-series data, relational or non?

后端 未结 10 1856
栀梦
栀梦 2020-11-28 17:04

I am creating a system which polls devices for data on varying metrics such as CPU utilisation, disk utilisation, temperature etc. at (probably) 5 minute intervals using SNM

相关标签:
10条回答
  • 2020-11-28 17:34

    Found very interesting the above answers. Trying to add a couple more considerations here.

    1) Data aging

    Time-series management usually need to create aging policies. A typical scenario (e.g. monitoring server CPU) requires to store:

    • 1-sec raw samples for a short period (e.g. for 24 hours)

    • 5-min detail aggregate samples for a medium period (e.g. 1 week)

    • 1-hour detail over that (e.g. up to 1 year)

    Although relational models make it possible for sure (my company implemented massive centralized databases for some large customers with tens of thousands of data series) to manage it appropriately, the new breed of data stores add interesting functionalities to be explored like:

    • automated data purging (see Redis' EXPIRE command)

    • multidimensional aggregations (e.g. map-reduce jobs a-la-Splunk)

    2) Real-time collection

    Even more importantly some non-relational data stores are inherently distributed and allow for a much more efficient real-time (or near-real time) data collection that could be a problem with RDBMS because of the creation of hotspots (managing indexing while inserting in a single table). This problem in the RDBMS space is typically solved reverting to batch import procedures (we managed it this way in the past) while no-sql technologies have succeeded in massive real-time collection and aggregation (see Splunk for example, mentioned in previous replies).

    0 讨论(0)
  • 2020-11-28 17:34

    5 Millions of rows is nothing for today's torrential data. Expect data to be in the TB or PB in just a few months. At this point RDBMS do not scale to the task and we need the linear scalability of NoSql databases. Performance would be achieved for the columnar partition used to store the data, adding more columns and less rows kind of concept to boost performance. Leverage the Open TSDB work done on top of HBASE or MapR_DB, etc.

    0 讨论(0)
  • 2020-11-28 17:43

    Create a file, name it 1_2.data. weired idea? what you get:

    • You save up to 50% of space because you don't need to repeat the fk_to_device and fk_to_metric value for every data point.
    • You save up even more space because you don't need any indices.
    • Save pairs of (timestamp,metric_value) to the file by appending the data so you get a order by timestamp for free. (assuming that your sources don't send out of order data for a device)

    => Queries by timestamp run amazingly fast because you can use binary search to find the right place in the file to read from.

    if you like it even more optimized start thinking about splitting your files like that;

    • 1_2_january2014.data
    • 1_2_february2014.data
    • 1_2_march2014.data

    or use kdb+ from http://kx.com because they do all this for you:) column-oriented is what may help you.

    There is a cloud-based column-oriented solution popping up, so you may want to have a look at: http://timeseries.guru

    0 讨论(0)
  • 2020-11-28 17:45

    If you are looking at GPL packages, RRDTool is a good one to look at. It is a good tool for storing, extracting and graphing times-series data. Your use-case looks exactly like time-series data.

    0 讨论(0)
  • 2020-11-28 17:46

    You should look into Time series database. It was created for this purpose.

    A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range).

    Popular example of time-series database InfluxDB

    0 讨论(0)
  • 2020-11-28 17:48

    I face similar requirements regularly, and have recently started using Zabbix to gather and store this type of data. Zabbix has its own graphing capability, but it's easy enough to extract the data out of Zabbix's database and process it however you like. If you haven't already checked Zabbix out, you might find it worth your time to do so.

    0 讨论(0)
提交回复
热议问题