in MySQL, is it good to have 500 Columns in one table?

一世执手 提交于 2019-12-13 08:35:19

问题


in MySQL, is it good to have 500 Columns in one table? the rows will be increasing daily, the maximum row count would be less than or equal to 1million.

just to give a brief, these are my column headers

TableName: process_detail id, process_id, item_id, item_category, attribute1,attribute2,attribute3,...,attribute500,user_id1_update_time,user_id2_update_time,user_id1_comments,user_id2_comments

all attributes are varchar with length maximum 30. but less than 30.

and i have 25 item_categories.

should i create one single table like 'process_detail', or should i create multiple tables categorywise like category1_process_detail,category2_process_detail,..,category25_process_detail

each item_category has different number of columns, some will have 200 columns and some will have only 50 columns. lets say category1 will have 200 columns, category2 will have 50 columns.

some columns in category1 will not be in category2.

item_ids in process_id1 may come in process_id2. what is the best approach in terms of good performance ? please adivce.


回答1:


No it is not a good idea. Instead use a many to one relational mapping.

For example, create the process_detail table as you propose but without the attribute columns. Then create another table process_detail_attributes

CREATE TABLE `process_detail_attributes`  
(`pda_id` INT NOT NULL AUTO_INCREMENT,  
 `id` INT NOT NULL,   
 `attribute_key` INT NOT NULL,  
 `attribute_value` VARCHAR(30) NOT NULL,  
  PRIMARY KEY(`pda_id`),  
  FOREIGN KEY (id) REFERENCES process_detail (id)
) ENGINE...

Then for each attribute (attribute1...attribute500) needed just enter a row into the attribute table with the appropriate id inserted into the foreign key column.

The benefits of doing this are numerous. The link Doomenik mentions is probably a good starting point to understand why, but to put it tersely...

-If all attributes aren't used there will be no wasted storage space.
-Even if the attributes are used, the data will be stored within the actual index B-Tree node, exorbitantly inflating the amount of data per page and decreasing the amount of pages able to fit in the buffer pool (i.e RAM) and decreasing the locality of the keys. This will subsequently slow the index traversal.
-If these attributes are going to require indices (which attributes often do) then the unruliness of this table will be unconscionable.

There are of course times when you can consider de-normalization for the sake of performance but this does not seem like one of them.

You can then select the data from process_detail with all of its attributes like this:

SELECT a.process_id,  
a.user_id1_u‌​pdate_time,  
a.user_id2_u‌​pdate_time,  
a.user_id1_comments,  
a.user_id2_comments,  
b.*  
FROM process_detail a INNER JOIN process_detail_attributes b  
WHERE a.id = b.id AND whatever_condition_you_want_to_filter_by_here;



回答2:


InnoDB won't support 500 varchar columns, because of the way rows are stored. Even if you use InnoDB's ROW_FORMAT=DYNAMIC, this would store 500x20 bytes per row for the varchars, which would be greater than the 8KB row size limit. See https://www.percona.com/blog/2010/02/09/blob-storage-in-innodb/ for more details on InnoDB row storage.

Having such a large number of columns is a red flag for problematic database design anyway.

  • If you store numerous columns for similar attributes, you're violating the principle of eliminating Repeating Groups of columns, which is part of making a table satisfy First Normal Form.

  • If the columns are not similar attributes, then you're simply not designing a relation. In a relation, you must define the heading with meaningful column names and data types. When you name your columns generically like attribute1, etc., you're not designing the table in a relational way.

I disagree with suggestions to use an EAV table design. I have posted frequently here on Stack Overflow or on my blog EAV FAIL about the fact that EAV is a broken design for a relational database.

See my answer to https://stackoverflow.com/a/695860/20860 or my presentation Extensible Data Modeling for some alternative solutions to your task of storing different attributes for different process types.

You might like to read about using the JSON data type in MySQL 5.7 to store semi-structured collections of attributes specific to each of your different process types.



来源:https://stackoverflow.com/questions/45611014/in-mysql-is-it-good-to-have-500-columns-in-one-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!