Break atomicity rule storing list of items when there is no reason to query about items? Verses join table that requires whole join table be scanned

问题

This specific case is regarding lists, and items.

The same item may belong to multiple lists, and each list has many items.

Option A (the "proper" way as I understand it): Make a join table, which has list_ID and item_ID. When I want all the items in a list query for list_ID.

Option B (break the atomicity rule): Make a list table. Primary Key, and either repeating columns or non-atomic columns. As I understand it both of these deviations suffer the exact same drawbacks, which is the inability to inefficiently query on items.

As I understand it, normalizing a database to NF-1 would be making sure each column is atomic, so I should NOT do option B. The reason is that would make querying items hard. E.g "how many of this item got sold" or "how many times this this item is in a list"

I don't think I will ever need that data (is that already a mistake / common pitfall? is there a meme amongst experienced database engineers like "you'll always want to be able to query, and you won't have time to reorganize the database when you scale"? Am I overestimating how long itd take MySQL to scan a join table?

I can't find anything to support this but at the same time I feel that its common sense to accept that non-compliant form if I know that the drawback is not important to the application. But I am not an expert and these rules were written by experts.

回答1:

TL; DR If you are going to write generic query/constraint expressions about parts of values of a column then the parts ought to get their own columns; otherwise not. If you change your mind then you can supply the old schema via a view, although typically updates would need recoding.

Relations have rows that have for each column one value of that column's type. Notions of "relation" or "normalized" or "1NF" in terms of some notion of "atomicity" are vague, confused and unhelpful. Eg a list is typically called "non-atomic", while a string is called "atomic"--even though a string is a list of characters. Bite the bullet--it's nonsense.

Every single design has to make this where-the-turtles-stop decision about its column types. It cannot be avoided. (Even when you get to a bit-typed column, you can decide to have a separate table for only the rows that would have 1s.) (Then there is no column hence no column type.) It is a design quality issue, but not an issue of normalization to higher NFs (normal forms).

Nothing in "normalization" to higher NFs (reducing certain update/redundancy problems by replacing a table by tables that join back to it) requires prior application-dependent "normalization" to "1NF" per "atomicity" (replacing a table by some table(s) with columns for parts of its columns). Because when rearranging to "1NF" would help we would need to write constraint expressions involving parts of values of columns so we would be doing that rearrangement anyway.

(Unfortunately "normalized" & "1NF" & "0NF" get used in many different ways, many nonsensical although common. Better to just say what exact property/properties you mean.)

PS 1 A list is generally considered to be an ordered collection of items, a set being an unordered collection of items. Predicate "collection C has item I as Nth member" records a list, "collection C has item I as a member" a set.

PS 2 PS Your presumption that a representation of a list or set "vertically" or "compactly" "down" a column in a table involves any more "scanning" than a representation "horizontally" or "compactly" "across" a value in a column in a row is naive/unjustified. Ie ((list, item1, 1), (list, item2, 2), ...) vs (list, [item1, item2, ...]).

来源：https://stackoverflow.com/questions/44401991/break-atomicity-rule-storing-list-of-items-when-there-is-no-reason-to-query-abou

标签

database-schema

database-normalization

multivalue