How do I not normalize continuous data (INTS, FLOATS, DATETIME, …)?

后端 未结 1 1193
耶瑟儿~
耶瑟儿~ 2021-01-24 16:11

According to my understanding - and correct me if I\'m wrong - \"Normalization\" is the process of removing the redundant data from the database-desing

However, when I w

相关标签:
1条回答
  • 2021-01-24 17:01

    The Comments (so far) are discussing the misuse of the term "normalization". I accept that criticism. Is there a term for what is being discussed?

    Let me elaborate on my 'claim' with this example... Some DBAs replace a DATE with a surrogate ID; this is likely to cause significant performance issues when a date range is used. Contrast these:

    -- single table
    SELECT ...
        FROM t
        WHERE x = ...
          AND date BETWEEN ... AND ...;   -- `date` is of datatype DATE/DATETIME/etc
    
    -- extra table
    SELECT ...
        FROM t
        JOIN Dates AS d  ON t.date_id = d.date_id
        WHERE t.x = ...
          AND d.date BETWEEN ... AND ...;  -- Range test is now in the other table
    

    Moving the range test to a JOINed table causes the slowdown.

    The first query is quite optimizable via

    INDEX(x, date)
    

    In the second query, the Optimizer will (for MySQL at least) pick one of the two tables to start with, then do a somewhat tedious back-and-forth to the other table to handle rest of the WHERE. (Other Engines use have other techniques, but there is still a significant cost.)

    DATE is one of several datatypes where you are likely to have a "range" test. Hence my proclamations about it applying to any "continuous" datatypes (ints, dates, floats).

    Even if you don't have a range test, there may be no performance benefit from the secondary table. I often see a 3-byte DATE being replaced by a 4-byte INT, thereby making the main table larger! A "composite" index almost always will lead to a more efficient query for the single-table approach.

    0 讨论(0)
提交回复
热议问题