Best practices with historical data in MySQL database

前端 未结 3 1498
北海茫月
北海茫月 2021-01-12 01:10

Recently I think about the best practices with storing historical data in MySQL database. For now, each versionable table has two columns - valid_from and

相关标签:
3条回答
  • 2021-01-12 01:21

    It's a common mistake to worry about "large" tables and performance. If you can use indexes to access your data, it doesn't really matter if you have 1000 of 1000000 records - at least not so as you'd be able to measure. The design you mention is commonly used; it's a great design where time is a key part of the business logic.

    For instance, if you want to know what the price of an item was at the point when the client placed the order, being able to search product records where valid_from < order_date and valid_until is either null or > order_date is by far the easiest solution.

    This isn't always the case - if you're keeping the data around just for archive purposes, it may make more sense to create archive tables. However, you have to be sure that time is really not part of the business logic, otherwise the pain of searching multiple tables will be significant - imagine having to search either the product table OR the product_archive table every time you want to find out about the price of a product at the point the order was placed.

    0 讨论(0)
  • 2021-01-12 01:31

    This is not complete answer, just few suggestions.

    You can add indexed boolean field like is_valid. This should improve performance with big table with historical and current records.

    In general - storing historical data in seprate table may complicate your application (just imagine complexity of query that is supposed to get data with mixed current and historical records...).

    Today computers are really fast. I think you should compare/test performance with single table and separate table for historical records.

    In addition - try to test your hardware to see how fast is MySQL with big tables to determine how to design database. If its too slow for you - you can tune MySQL configuration (start with increasing cache/RAM).

    0 讨论(0)
  • 2021-01-12 01:38

    I'm nearing completion of an application which does exactly this. Most of my indexes index by key fields first and then the valid_to field which is set to NULL for current records thereby allowing current records to be found easily and instantly. Since most of my application deals with real time operations, the indexes provide fast performance. Once in a while someone needs to see historical records, and in that instance there's a performance hit, but from testing it's not too bad since most records don't have very many changes over their lifetime.

    In cases where you may have a lot more expired records of various keys than current records it may pay to index over valid_to before any key fields.

    0 讨论(0)
提交回复
热议问题