问题
UPDATE: part 1- start:
We want to get our customers data, but which kind of data? they are not specified! actually we want to have a dynamic application that the admin can define new questions (he can set the answer's data type, length, and other rules) and he also can deactivate old questions!
(I don't want to use EAV schema design, but I can't find the alternative way)
UPDATE: part 1 - end:
so I decided to create a quiz application, admin can define questions which their answers data type can be different!
Example:
question ID 1 : what is you name? answer: John (varchar)
question ID 2 : how old are you? answer: 25 (integer)
question ID 3 : how much is your salary per hour? answer: 30.65 (decimal)
question ID 4 : describe yourself? answer: I'm so kind ... (text)
UPDATE: part 2 - start:
to save the answers it comes to my mind 3 choices:
UPDATE: part 2 - end:
- create a table like this (EAV schema design):
Table profile_answers
id int [pk, increment]
question_id int
profile_id int
answer text
as you see I've save all answers as text! I know that it works, but is it the best way? actually this app is going to have millions of answers and we want to analyze the answers by machine (for example: getting customers average age, getting married customers, customers with salary > 30.52 and etc) so I want to implement the way with the best performance!
- create a table like this (EAV schema design):
Table answers
id int [pk, increment]
question_id int
profile_id int
integer_value int
decimal_value decimal
varchar_value varchar
boolean_value boolean
longtext_value longtext
now I can save data in the proper field and put NULL in the other fields (the answer's data type will be determined while defining it's question)
For example:
question ID 1 : what is you name? answer: John (varchar)
id 1
profile_question_id 1
profile_id 1
integer_value NULL
decimal_value NULL
varchar_value John
boolean_value NULL
longtext_value NULL
----------------
question ID 2 : how old are you? answer: 25 (integer)
id 2
profile_question_id 2
profile_id 1
integer_value 25
decimal_value NULL
varchar_value NULL
boolean_value NULL
longtext_value NULL
----------------
question ID 3 : how much is your salary per hour? answer: 30.65 (decimal)
id 3
profile_question_id 3
profile_id 1
integer_value NULL
decimal_value 30.65
varchar_value NULL
boolean_value NULL
longtext_value NULL
UPDATE: part 3 - start:
- when admin adds a new question, I can add a column to answers table (change table's schema) like this
UPDATE: part 3 - end:
It's all about performance! what's the best way? If none of them is not the best one and I should redesign the database what is the correct database design?
回答1:
Non-EAV
What is the datatype for? The user typed the Answer as a string; he did not specify the datatype, did he? That avoids EAV -- Just store strings. At that point, the question
can simply be a TEXT
column in one table. And the answer
a column in another table, such as the profile_answers
you proffered.
As for queries like "getting customers average age, getting married customers, customers with salary > 30.52", you are stuck with table scans, whether it is EAV or not. The non-EAV approach will be more efficient since there are fewer hoops to go through to get to the value.
You have a Customers
table; one row per customer, with birthdate as a column. Average age involves reading that column and doing some simple arithmetic. Ditto for marital status and salary.
In other words, the UI has a <form>
that asks about "customer info". (And other forms that ask about other stuff.) Then answers for one customer goes directly into a table designed specifically for customer info.
You could have that form and the table be "table-driven". But that is unnecessarily complex. If, as I first understood you Question, you are building a classroom app that involves test-taking with hundreds of questions, that might lead to something table-driven.
EAV
If you do stick with 5 datatypes, then you have these tough questions: How much precision in your DECIMAL
? If the Question is "What is the value of pi", then how do you handle this variety: 3.14, 3.1416, 3.14159, 3.14159265358979, 22/7, "about 3.14", etc? Only VARCHAR(...)
or TEXT
handles those.
What did you mean by "(for example: getting customers average age, getting married customers, customers with salary > 30.52 and etc)" ? If you mean that the user typed in "30.52", then it works fine to put that into a TEXT
column and generate the query
with salary > "30.52"
That is, numeric value can be fed into queries as strings; the datatype does not need to match. (However "22/7" would be treated as 22, and "about 3.14" would be compared as 0.)
Millions of rows -- Will you be looking through all of them at once? Very often? If you have 50 students answering 40 questions, that is a paultry 2000 Q&A to go through at a single time; not millions.
回答2:
Relational databases work best when you know the schema at design time. Your requirement is one where the schema (for each quiz) is known aonly at run time.
EAV seems like a work around - and it does, indeed, allow you to store the data. But querying it very quickly becomes almost impossible - imagine trying to find all customers over the age of 32 who earn between 20 and 30K and who are getting married. Using appropriate data type will help, a little, but your queries will rapidly become extremely complex and, and impossible to optimize.
The good news is that most database engines support either XML or JSON documents, with pretty good query performance. MySQL does JSON pretty well, for instance.
I would model my system using the major concepts as relational entities (customer, quiz, question), and store the answers as JSON in MySQL.
回答3:
Having NULL values is never a good practice. In point 2. you have 8 columns and 4 of them are NULL. Adding the column for each answers's data type is bad practice.
Having a column with a string data type is a workaround to solve adding nullable columns for each answer's data type.
You will get better performance with 2. approach. If you need even better performance, you can go with a separate table for each answer data type.
I had experience with quizzes and forms. We need to have rules for conditional showing page, question ... We discussed a lot and I had trouble to convince team that we should switch to JSON or something like that (NOSQL). Relation database is not intended to solve this kind of problems.
Right way would be to separate quiz questions and answers from statistics. You should move questions, answers, rules and that kind of stuff to NOSQL. Answer and question statistics should be in the relational database.
回答4:
According to the official MySQL documentation about String Type Storage Requirements a TEXT data type requires L + 2 bytes, where L is the length of the string. So, if the answer given is the numeric value 100 it will use 5 bytes.
Whereas, according to the docs entry about Numeric Type Storage Requirements the same numeric value (100) would only take a single byte when using the TINYINT datatype (max 255) or 2 bytes when using SMALLINT type (max 65535)
So, if you are storing a large number of rows in your table, the storage can be upto 5 times greater for values less than 256 when saved as TEXT.
For higher values the storage different becomes much greater:
60000 as TEXT = 7 bytes 60000 as SMALLINT = 2 bytes
I would recommend that you restructure your data model so that the data is stored with the correct type. This will also ensure indexing is more performant, String/Numeric functions work correctly and App level abstraction layer components such as Doctrine, will map to the correct language type in PHP, etc.
Here is a good article about optimizing MySQL data model design.
来源:https://stackoverflow.com/questions/60298337/what-is-the-best-practice-to-save-a-quiz-answers-with-different-data-types-in-my