问题
An application that I'm facing at a customer, looks like this:
- it allows end users to enter "materials".
- To those materials, they can append any number of "properties".
- Properties can have a any value of type: decimal, int, dateTime and varchar (length varying from 5 characters to large chunks of text),
Essentially, the Schema looks like this:
Materials
MaterialID int not null PK
MaterialName varchar(100) not null
Properties
PropertyID
PropertyName varchar(100)
MaterialsProperties
MaterialID
PropertyID
PropertyValue varchar(3000)
An essential feature of the application is the search functionality: end users can search materials by entering queries like:
- [property] inspectionDate > [DateTimeValue]
- [property] serialNr = 35465488
Guess how this performs over the MaterialsProperties-table with nearly 2 million records in it.
Database was initially created under SQL Server 2000 and later on migrated to SQL Server 2005
How can this be done better?
回答1:
- Since users can enter their own property names, i guess every query is going to involve a scan of the properties table (in your example i need to find the propertyid of [inspectionDate]). If the properties table is large, your join would also take a long time. You could try and optimize by denormalizing and storing name with propertyID. This would be a denaormalized column in the MaterialsProperties table.
- You could try adding a property type (int, char etc) to the materialsproperty table and partition the table on the type.
- Look at Object Relational Mapping/Entity Attribute Value Model techniques for query optimization.
- Since you already have a lot of data (2 million records) do some data mining as see if there are repeating groups of properties for many materials. You can them put them in one schema and the rest as the EAV table. Look here for details: http://portal.acm.org/citation.cfm?id=509015&dl=GUIDE&coll=GUIDE&CFID=49465839&CFTOKEN=33971901
回答2:
You could consider separating your MaterialsProperties table by typel e.g. into IntMaterialProperties
, CharMaterialProperties
, etc. This would:
- Partition your data.
- Allow for potentially faster look-ups for integer (or other numeric) type look-ups.
- Potentially reduce storage costs.
You could also introduce a Type
column to Properties
, which you could use to determine which MaterialProperties
table to query. The column could also be used to validate the user's input is of the correct type, eliminating the need to query given "bad" input.
来源:https://stackoverflow.com/questions/1310630/approach-to-generic-database-design