When to Denormalize a Database Design

前端 未结 9 1830
半阙折子戏
半阙折子戏 2020-11-29 15:52

I know that normalization has been extensively discussed on Stack Overflow. I\'ve read many of the previous discussions. I\'ve got some additional questions though.

I

相关标签:
9条回答
  • 2020-11-29 16:37

    I agree with your senior about (1). A transaction table row must capture the entire state at the moment of the transaction. Period. What you're suggesting doesn't record the actual data, so it's inadmissible. I also agree about (2). Whatever the business wants by way of crosschecking, you must implement. Accounting is based on cross checking, double entry, rolling up ledgers, etc. You must do it. This so fundamental that you shouldn't even look on it as denormalization, just as implementing the business requirement.

    0 讨论(0)
  • 2020-11-29 16:39

    1) This is an archive. Everything that is in it should never be updated. I'd go with the senior guy's suggestion and have that invoice table be self-contained. Perhaps use a blob for the invoice itself that contains markup language?

    2) Reporting services, a warehouse table that is trigger-updated, something you build by script whenever... these would all be fine, I think. It is indeed ideal to be normalized, but it isn't always fast. I have a good sized healthcare database I manage which is fully normalized... and then has a series of de-normalized tables with rolled-up equations and commonly pulled fields. Almost everything runs from that de-normalized set -- it's just faster to append to these with a trigger when files are loaded than to keep having to pull from various tables everytime I want to look at a 100,000 record report.

    0 讨论(0)
  • 2020-11-29 16:39

    For #1

    The invoice should be calculated from the sales and payments. If you do not have detailed sales data including price/product/discount/shipping/etc start there.

    For #2

    Writing an accounting system into the db from scratch is a big project. Make sure you have the accountants give you business rules so you can measure your systems accuracy. The last thing you want is the CFO step into the DBA meeting and announce the DB is overcharging the customer, even worse you are undercharging and driving the company out of business.

    If you have SQL Server give the Adventure Works db a look. If you hate MS then look at Adventure Works and don't do it that way.

    0 讨论(0)
  • 2020-11-29 16:40

    1) Does not require denormalization. You just need to determine what level of detail of each change you need and persist that with an appropriate key.

    2) Has nothing to do with denormalization. Storing summary data does not make the database denormalized. Storing results derived from non key attributes in the same table would be an example of denormalization but that doesn't seem to be what you are talking about here.

    0 讨论(0)
  • 2020-11-29 16:41

    Database normalization removes duplicates and makes sql queries for data update more efficient (and gives some other improvements).

    But if most of your queries are used for data selecting and select queries connect to several tables at the time, you may consider denormalization of these tables. It will increase the amount of disk space needed for data, time execution of sql update queries but will improve select queries.

    0 讨论(0)
  • 2020-11-29 16:48

    Your senior colleague is a developer, not a data modeller. You are better off starting from scratch, without them. Normalisation is complicated only to those who will not read books. It is fair enough that he makes you think, but some of the issues are absurd.

    Your numbers:

    1. You need to appreciate the differences between actual online data, and historic data; then the difference between merely historic and archival needs. All of them are right for the specific business requirement, and wrong for all others, there is no universal right and wrong.

      • why is there no paper-based copy of the invoice ? In most countries that would be a legal and tax requirement, what exactly is the difficulty of fishing out the old invoice ?
      • where the database has the requirement of storing the closed invoices, then sure, as soon as the invoice is closed, you need a method of capturing that information.
      • ProductPrice (actually, I would call it ProductDate) is a good idea, but may not be necessary. But you are right, you need to evaluate the currency of data, in the full context of the whole database.
      • I cannot see how copying the product price to the invoice table would help (are not there many line items ?)
      • in modern databases, where the copy of the invoice is required to be regurgitated, the closed Invoice is additionally stored in a different form, eg XML. One customer saves the PDFs as BLOBs. So there is no messing around with what the product price was five years ago. But the basic invoice data is online and current, even for closed invoices; you just cannot recalculate ancient invoice using current prices.
      • some people use an archive_invoice table, but that has problems because now every code segment or user report tool has to look in two places (note that these days some users understand databases better than most developers)
      • Anyway, that is all discussion, for your understanding.
        • The database serves current and archival purposes from the one set of tables (no "archive" tables
        • Once an Invoice is created, it is a legal document, and cannot be changed or deleted (it can be reversed or partially credited by a new Invoice, with negative values). They are marked IsIssued/IsPaid/Etc
        • Products cannot be deleted, they can be marked IsObsolete
        • There are separate tables for InvoiceHeader and InvoiceItem
        • InvoiceItem has FKs to both InvoiceHeader and Product
        • for many reasons (not only those you mention), the InvoiceItem row contains the NumUnits; ProductPrice; TaxAmount; ExtendedPrice. Sure, this looks like a "denormalisation" but it is not, because prices, taxation rates, etc, are subject to change. But more important, the legal requirement is that we can reproduce the old invoice on demand.
        • (where it can be reproduced from paper files, this is not required)
        • the InvoiceTotalAmount is a derived column, just SUM() of the InvoiceItems
    2. That is rubbish. Accounting systems, and accountants do not "work" like that.

      • If it is a true accounting system, then it will have JournalEntries, or "double entry"; that is what a qualified account is required to use (by law).

      • Double Entry does not mean duplicate entry; it means every financial transaction (one amount) shall have a source account and target account that it is applied to; so there is no "denormalisation" or duplication. In a banking database, because the financial transactions are against single accounts, that is commonly rendered as two separate financial transactions (rows) within one Db Transaction. Ordinary commercial database constraints are used to ensure that there are two "sides" to every financial transaction.

      • Ensuring that Invoices are not deleteable is a separate issue, to do with security, etc. if anyone is paranoid about things being deleted from their database, and their database was not secured by a qualified person, then they have more and different problems that have nothing to do with this question. Obtain a security audit, and do whatever they tell you.

      • Wikipedia is not a reliable source of information about normalisation.

      • A Normalised database is always much faster than Unnormalised database. So it is very important to understand what Normalisation and Denormalisaion is, and what it isn't. The process is greatly hindered when people have fluid and amateur "definitions", it just leads to confusion and time-wasting "discussions". When you have fixed definitions, you can avoid all that, and just get on with the job.

      • Summary tables are quite normal, to save the time and processing power, of recalculating info that does not change, eg: YTD totals for every year but this year; MTD totals for every month in this year but not this month. "Always recalculating" data is a bit silly when (a) the info is very large and (b) does not change. Calculate for the current month only

        • In banking systems (millions of Trades per day), at EndOfDay, we calculate and store Daily Total as well. These are overwritten for the last five days, because Auditors are making changes, and JournalEntries against financial transactions for the last 5 days are allowed.
        • non-banking systems generally do not need daily totals
      • Summary tables are not a "denormalisation" (except in the eyes of those who have just learned about "normalisation" from their magical, ever-changing fluid "source"; or as non-practitioners, who apply simple black-or-white rules to everything). Again, the definition is not being argued here; it simply does not apply to Summary tables.

      • Summary tables do not affect data integrity (assuming of course that the data that they were sourced from was integral).

      • Summary tables are an addition to the database, which are not required to have the same constraints as the database. There are essentially reporting tables or data warehouse tables, as opposed to database tables.

      • There are no Update Anomalies (which is a strict definition) related to Summary tables. You cannot change or delete an invoice from last year. Update Anomalies apply to true Denormalised or Unnormalised current data.

    0 讨论(0)
提交回复
热议问题