问题
I have a db with more that 100K records. A lot of categories and many items (with different properties per category) Everything is stored in a EAV.
If I try to break this scheme and create for any category a unique table is something that will I have to avoid?
Yes, I know that probably I'll have a lot of tables and I'll need to ALTER them if I want to add an extra field, BUT is this so wrong?
I have also read that as many tables I have, the db will be populate with more files and this isn't good for any filesystem.
Any suggestion?
回答1:
As the primary structure in a database design, the structure will fail as the data grows. The way you know that a database schema does not fit the business model is when you need to query against it for reporting. EAV's require many workarounds and non-native database functionality in order to get reasonable reports. I.e., you are constantly creating crosstabs/pivot queries for even the smallest query. All that processing to take the EAV and put it in a queryable format chews through CPU cycles and is highly prone to error. In addition, the size of the data is growing geometrically. If you have 10 attributes, 10 rows in a standard design will generate 100 EAV rows. 100 standard rows would equate to 1000 EAV rows and so on.
Database management systems are designed to handle lots of tables and this should not be a worry.
It is possible to create a hybrid solution where an EAV structure is part of the solution. However, the rule must be that you can never include a query [AttributeCol] = 'Attribute'
. I.e., you can never filter on, sort on, restrict the range on any attribute. You cannot place a specific attribute anywhere in a report or on-screeen. It is just a blob of data. Combined with a good schema for the rest of the system, having an EAV that stores a blob of data can be useful. The key to making this work is enforcement amongst yourself and the developers never to cross the line of filtering or sorting on an attribute. Once you go down the dark path, forever will it dominate your destiny.
回答2:
There are database engines purpose built to run EAV models. I don't know them so I can't recommend one. But shoving an EAV model into a relational engine is a recipe for disaster. Disaster will occur, it's really just a matter of time.
It's possible that your data will stay small enough, and your queries simple enough for this to work but that's rarely the case.
回答3:
EAV DB schema is very flexible for adding more relational database's "columns" but at the cost of deteriorating the query performance and losing your business logic which was kept in the relational database schema.
Because you have to create multiple views to actually pivot the result, which will cause the performance issue if the table contains billions of rows. And another nature of EAV schemas is queries are always made when you join the data table with the meta data table and there might be multiple joins on the same data table.
This is based on my experience.
回答4:
I took this approach on a Authoring System I built for e-learning about 4 years ago. I didn't know I was doing EAV at the time, but I thought I was being all sly just using name/value type pairs. I figured I'd have increased records, but less re-design as I got highly tired of adjusting columns out to the left every time we had a change request.
I did my first test constructing out a hierarchy for the system in one table. Thats performed great with about 4 projects, 25 Products and 4 to 5 tools each all assigned out thru tier integers that link back to their primary keys.
I've been recording assets that pass thru the system, and this meant FLV files, SWF, JPG, PNG, GIF, PDF, MP3 etc ... and all the mime-type specifics about them. This ranges from just 4 to 10 attributes on each file. Its totaled up to 8 million "asset data" records, where as we have about 800K assets (est). I had a request to put all that information into columns for a report. The SQL Statement would have to do a number of table joins on it self, let alone the fact if they want to know the content it was used in, product, or project its just a slew of JOIN's.
From a granular perspective works great. From a Excel report perspective put your seat belt on. I've mitigated it by doing snapshots out to tables that reflect the data the way someone wants in a report, but it takes awhile to compile that information which required me to offload (SQL Dump) to another server.
I've found my self asking if this was the right thing to do, and for this project I could say up to this request for a report on a grand scale "yes". But it makes the server sweat pretty bad correlating it all. Really depends on the deep level of queries they make.
Since I dabble with SQL since 2002 and use it in supporting tools nothing on a huge scale its survived. If it was a larger million person, terabyte+ database I'd be probably pulling my hair out.
Special Note: I found out this system was on RedHat, and it was 32bit. Much of the PHP processing threads were unable to run on more than 1 CPU core, and the server had 7 more cores sitting idle! Queries that were taking up to 45 minutes to run on this machine, actually could run in 14-25 seconds on a 64bit system properly configured. Also food for thought when considering performance.
来源:https://stackoverflow.com/questions/2668011/eav-database-scheme