How can Hard Deletes work when Foreign Keys are involved

问题

I need some wisdom shed on the controversy over soft-deletes vs hard-deletes in a relational-database-centric application. Commentaries like this, this, and this all discourage soft-deletes as basically a hack, a dirty, easy way out, and claim that hard-deletes are the better way.

I'd love to agree. But nobody seems to mention how to deal with one major, inescapable problem: how can one hard-delete a row when a foreign key constraint prevents its deletion, and when cascade-deleting is not an option?

Essentially, if you want to hard-delete entity A, and entity B is pointing to it, and for business reasons B simply cannot be removed, doesn't that force you to use some form of soft-delete (i.e. keep the record right where it is)?

If you have a customer "Acme" who has made transaction 10001, you can't simply just hard-delete Acme because the virtually any business would require that:

No transaction can be deleted
A transaction must point to the customer that made it

Regardless of how Acme got "deleted" (hard, soft, over-medium) a sales report, for instance, would still have to show that Transaction #10001 was made on 2010-05-01 by customer Acme--even if Acme is no longer around. So it requires that Acme be still stored at least somewhere in the system.

One thread keeps coming up from the advocates of hard-delete: use a proper audit table, and do things like keep a serialized copy of the deleted object in there. But we can't redirect the FK of Transaction #10001 to this particular row in the audit table. What then? Create an "OldCustomers" table that mirrors the schema of "Customers" and somehow redirect every FK in the system that pointed to Acme there? I'm no DB expert so I don't know if that's even possible, but even if it were, now all your reporting queries have to factor in two tables (sounds even worse than having to append AND IsDeleted = false to all queries)

In short, I feel like I'm missing something because hard- and soft-deletes are debated--implying that they are both feasible options. But as I understand it, in 95% of business cases, there is no possibility for using hard-delete (on a technical level, due FK constraints, and on a logical level, simply because a thing can't just disappear when another thing points to it)

回答1:

TL;DR Appropriate tables & constraints arise as part of proper modeling. Your question incorrectly assumes limitations on what the tables & constraints are going to look like.

From a comment of yours:

is it acceptable that that particular transaction should just have NULL for it's FK value? When you query that row, it basically indicates "there is no customer associated with this transaction"

Well, there is no current customer associated with this transaction. If there is a past customer that is associated with this transaction that you want recorded then you have to record it. Ditto for any other data about the customer or the transaction. Proposals to hard delete assume that you keep desired historical data. Do not limit your thinking about achieving this to merely nulling FKs, cascading, adding a flag/date column to an extant table or anything else. Properly model both present & past including database changes that need to occur as a DBMS transaction upon each chosen application situation change. Proposals to soft-delete just involve putting certain current and historical data into the same table vs different ones. This only works for very simple models of current & historical situations.

It is usually straightforward to design a database for only a current application situation. But if we do care about the past we typically only care about some of it. If so, upon certain application situation changes from current to past we can copy a snapshot of the relevant current state into historical state. Labeling data with soft-delete flags vs dates is the combined-table version of undated vs dated historical data, where we only care about current vs past situations and we only care that vs when a change occurred.

"Temporal" databases more or less record the current situation and a bunch of dated once-current situations. This recording of past data using the structure for current data simplifies understanding & querying of current & past data. (The querying about intervals of time that a temporal database can facilitate can get quite complicated.) But it turns out that making a temporal version of a given current-data design does not just involve adding date columns to extant current-data tables. It requires remodeling current data, breaking it into smaller tables with more constraints. This is because different kinds of application situation changes require dating different column combinations of the extant current-data design. (Hard and soft historical snapshot designs must address this, but for a limited past/history.)

The idea behind "no soft deletes" is that it is safer to have two distinct contexts each restricting the mistakes that users and implementers can make. The idea behind "historical" vs "temporal" is that it can be way simpler to model and query a limited predetermined scope of past application situations. Yet the designs for each approach option arise naturally when you apply sound design principles in modeling whatever amount of data about your application's current and past situations you decide you want.

PS If you want to understand temporal databases, read Lorentzos, Date & Darwen and avoid Snodgrass.

回答2:

I use a temporal design which make extensive use of soft deletes. Or get the current data by executing the same query "as of" now. The design allows for hard deletes (rarely used), soft deletes (most often used) and firm deletes (like soft deletes but not allowed to undo). This design allows one to execute a query "as of" a date and return the same result that would have been returned if the query had been executed on that date.

Needless to say, this extremely difficult if "old" data and "current" data are contained in different tables.

Looking over your references, I noticed there was a large concern for performance, yet there were no actual examples of performance hits and the first reference even admitted that "[i]n practice, filtering inactive rows doesn’t cost too much in itself."

This matches my testing of "versioned" tables with millions of versions of 100s of thousands of entities.

I get around the complexity issue by supplying views which exposes only current data and others to expose the entire history (and other views as needed). This is no problem for me as I tend to have apps access views instead of tables anyway.

The general complaint, when you get right down to it, is that using soft deletes makes the db developers' jobs more difficult. This is true. But our job is to make the users' jobs easier -- not our own.

My daughter is a financial analyst at a large bank. She often tells me how much easier her job would be if she could look back in time to see what the data looked back at some particular date in the past. There are archives, of course. But typically this only shows the final state of the data when it was archived. She can't run through the progression as accounts are opened, go through changes and finally get closed. (These are large corporate retirement accounts, not regular checking or savings accounts.)

But she can't get her IT dept interested because it would "be too much work." Sad.

来源：https://stackoverflow.com/questions/40165123/how-can-hard-deletes-work-when-foreign-keys-are-involved

标签

database-design

entity

enterprise

referential-integrity