问题
I am working with a large project that has many objects that represent simple (non-related) values. Sometimes these values are a single string, sometimes they are two strings, sometimes a string and an int...
Currently we have a 'values' table in our relational database that contains the columns: Id
, Category
, String1
, String2
..., Int1
, Int2
..., Double1
etc. It's convenient, but a mess.
The values all have the following properties:
- Every object with the same
Category
has the same attributes (ie. is typed). - No objects are related (the only key is the
Id
primary key).
How do we navigate out of this mess? As I see it, our options are as follows:
- Just keep adding columns as necessary and forget about semantic mapping between table and object. Just pile it on.
- Create a new table for every value object. This will add a large number of tables to the database, most of which will have less than 6 rows. I'm worried about the noise that all these extra tables adds to the database.
- Deploy a schema-free database just for these objects (not really a possibility with our deployment scenarios).
- Create a table of with
Id
,Category
columns and a BLOBValue
column and serialize the value objects into the value column. Is this viable?
This post restates our options. Are there any caveats or pitfalls to using serialization? Is there an option I'm not aware of? Advice most welcome.
回答1:
I stumbled upon this by navigating from another relevant question. Although it's quite old, I was intrigued to answer because it not only poses a very well stated problem but also allows one to argument on database denormalization as a whole.
There are many reasons and even more excuses for a database to be denormalized. Performance might be the most important, but difficulty in data classification (such as the issue at hand) is definitely the most common. Moreover, there are many ways a database can be denormalized, and a good deal of them are addressed by the OP.
Fact is, though, that a database should be denormalized as a last resort, after everything else has failed. The reasons for that include:
Data become meaningless to humans as well as the RDBMS. It's hard for someone to understand, or even remember, the purpose of a field named
Integer1
or a serialized value which can potentially hold anything. And the RDBMS cannot extract values from serialized entities in order to sort results or apply aggregates.Maintaining a volatile schema is hard. There's a reason why a database schema should be constant. Other, higher levels depend on it. If the schema changes overnight, applications should change too, to reflect the new status. Even worse, views, stored procedures and other dependant database components become equally difficult to maintain.
Constraints cannot be enforced, indexes cannot be created. There's no point defining a serialized field as a foreign key, or confine it to a specific set of values. This cancels a great deal of the database's self-protect mechanisms. Less data integrity means more administrative cost. Moreover, an index would be equally useless here, making the table less open to optimization.
Metadata will have to, eventually, be stored as data. Imagine a multilingual CMS in which there's a main
article
table to hold articles. Now, for every language supported, there's a correspondingarticle_{lang}
table to hold translations (i.e.article_en
,article_fr
,article_es
etc). In order to record the existing translations of articles, a "relation" table should be created, with a foreign key to thearticle
table, a language id, a table name for the translation table and a field that should be a FK to the tranlation table but cannot be defined as one. Then, try to write a query that counts the available translations for each article!
So aviod denormalization as much as possible. If entities can be classified to an extent, then IS-A relations could be the answer. To support arbitary attributes, or when classification is just not worthwhile, a key/value pair table, with a foreign key to the table holding normalized data, is more than enough a sacrifice.
来源:https://stackoverflow.com/questions/15650898/how-to-store-value-objects-in-a-relational-database