DB Design and Data Retrieval from a heavy table

问题

I have a requirement to have 612 columns in my database table. The # of columns as per data type are:

BigInt – 150 (PositionCol1, PositionCol2…………PositionCol150)

Int - 5

SmallInt – 5

Date – 150 (SourceDateCol1, SourceDate2,………….SourceDate150)

DateTime – 2

Varchar(2000) – 150 (FormulaCol1, FormulaCol2………………FormulaCol150)

Bit – 150 (IsActive1, IsActive2,……………….IsActive150)

When user does the import for first time the data gets stored in PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1, etc. (other datetime, Int, Smallint columns).

When user does the import for second time the data gets stored in PositionCol2, SourceDateCol2, FormulaCol2, IsActiveCol2, etc. (other datetime, Int, Smallint columns)….. so and so on.

There is a ProjectID column in the table for which data is being imported.

Before starting the import process, user maps the excel column names with the database column names (PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1) and this mapping get stored in a separate table; so that when retrieved data can be shown under these mapping column names instead of DB column names. E.g.

PositionCol1 may be mapped to SAPDATA

SourceDateCol1 may be mapped to SAPDATE

FormulaCol1 may be mapped to SAPFORMULA

IsActiveCol1 may be mapped to SAPISACTIVE

40,000 rows will be added in this table every day, my questions is that will the SQL be able to handle the load of that much of data in the long run?

Most of the times, a row will have data in about 200-300 columns; in the worst case it’ll have data in all of the 612 columns. Keeping in view this point, shall I make some changes in the design to avoid any future performance issues? If so, please suggest what could be done?

If I stick to my current design, what points I should take care of, apart from Indexing, to have optimal performance while retrieving the data from this huge table?

If I need to retrieve data of a particular entity e.g. SAPDATA, I’ll have to go to my mapping table, get the database column name against SAPDATA i.e. PositionCol1 in this case; and retrieve it. But, in that way, I’ll have to write dynamic queries. Is there any other better way?

回答1:

Don't stick with your current design. Your repeating groups are unweildy and self limiting... What happens when somebody uploads 151 times? Normalise this table so that you have one of each type per row rather than 150. You won't need mapping this way as you can select SAPDATA from the positioncol without worring if it is 1-150.

You probably want a PROJECTS table with an ID, a PROJECT_UPLOADS table with an ID and an FK to the PROJECTS table. This table would have Position, SourceDate, Formula and IsActive given your use-case above.

Then you could do things like

select p.name, pu.position from PROJECTS p inner join PROJECT_UPLOADS pu on pu.projectid = p.id WHERE pu.position = 'SAPDATA'

etc.

来源：https://stackoverflow.com/questions/10846340/db-design-and-data-retrieval-from-a-heavy-table

标签

sql-server-2008

database-design

query-optimization

data-retrieval