Is it really better to use normalized tables?

醉酒当歌 提交于 2020-01-11 02:26:45

问题


I heard my team leader say that in some past projects they had to do away with normalization to make the queries faster.

I think it may have something to do with table unions.

Is having more lean tables really less efficient than having few fat tables?


回答1:


It depends ... joining tables is inherently slower than having one big table that is 'pre-joined' ie de-normalised. However, by denormalising you're going to create data duplication and your tables are going to be larger. Normalisation is seen as a good thing, because it creates databases that can answer 'any' question, if it is properly done you can build a select to get to your data. This is not the case in some other forms of DB, and those are now (mostly) historic irrelevancies, the normalised/relation DB won that battle.

Back to your question, using de-normalisation to make things go faster is a well accepted technique. It's normally best to run your DB for a while so you know what to de-normalise and what to leave alone, and it's also common to leave the data in its 'correct' normalised form and pull data into a set of de-normalised reporting tables on a regular basis. If that process is done as part of the report run itself then the data is always up to date too.

As an example of over-normalisation I've seen DBs in the past where the days of the week, and months of the year were pulled out into separate tables - dates themselves were normalised - you can go too far.




回答2:


You should do some research on the differences between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) databases.

In short, database systems which are concerned primarily with recording transactions (OLTP) are usually structured in a more normalized fashion, reducing data duplication and easing the creation and updating of records at the expense of optimized data retrieval.

Database systems which are more concerned with data retrieval and analysis (OLAP) are usually structured in a less normalized fashion, sacrificing data storage optimization so to maximize querying and analysis speed.

Database normalization and Denormalization are at the heart of this trade off.




回答3:


Jeff wrote about this, followed by a heated discussion. It is also subject of much discussion on SO, e.g. whats the better database design more tables or more columns. As others have pointed, use common sense and do not over-normalize.




回答4:


In my long experience with Oracle OLTP databases, some of them very large and busy, I can honestly say I can't remember ever having come across a case where "denormalisation for performance" was truly required. I have, however, seen many cases where someone has decided in advance that denormalisation should be applied because of their fear, uncertainty and doubt about potential performance issues. This has usually been done without any benchmarking, and invariably I find that no performance improvement has been achieved in fact - but the data maintenance code has become far more complex than it would have been.

OLAP is a very different animal, and I'm not in a position to comment about that.




回答5:


This question recurs altogether too often. The primary reason is that SQL, the most popular database language by a huge margin, and all of its most popular implementations, conflate logical table design with physical table design.

The eternal answer is that you should always normalize your logical tables, but the pragmatic answer is complicated by the fact that the only way to implement certain optimizations under existing SQL implementations is to denormalize your physical table design (itself not a bad thing) which, in those implementations, requires denormalizing your logical table design.

In short, it depends. Sometimes denormalization is important for performance, but like everything else performance-related you should measure, measure, measure before you even consider going down this route.




回答6:


Performance is inverse to the amount of normalization done on RDBMS. That being said, the more normal the tables are, the less likelihood there is for errors. There is a point to where a RDBMS performance can be hurt by denormalization, at the point to where all of the data is held in one table.




回答7:


The reason why normalization has been known to hurt performance is because joins are fairly expensive. If there are N records in table X and M records in table Y, then a join of X and Y creates a temporary table with as many as N*M records. Though there are optimization tricks that the database uses to not generate the entire table if it's not needed, it nevertheless has to process all the records.

Denormalization is the process whereby you put data often used together in a single table to increase performance, at the sake of some database purity. Most find it to be an acceptable trade, even going so far as to design the schema intentionally denormalized to skip the intermediary step.



来源:https://stackoverflow.com/questions/540136/is-it-really-better-to-use-normalized-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!