Key/Value pairs in a database table

问题

I need to design a Key/value table in my database and I'm looking for guidance on the best way to do this. Basically, I need to be able to associate values to a dynamic set of named properties and apply them to an external key.

The operations I need to be able to support are:

Apply a key/value pair to a group of items
Enumerate all of the currently-active keys
Determine all of the items that have a value for a given key
Determine all of the items where the value associated with a given key matches some criteria.

It seems that the simplest way to do this is to define a table:

CREATE TABLE KeyValue (
  id    int,
  Key   varchar...,
  Value varchar...
);

It seems that I am likely to be duplicating a lot of data in the Key column because I any given key is likely to be defined for a large number of documents. Replacing the Key varchar with an integer lookup into another table seems to alleviate this problem (and make it significantly more efficient to enumerate all of the active keys), but sticks me with the problem of maintaining that lookup table (upserting into it whenever I want to define a property and potentially removing the entry any time a key/value is cleared).

What's the best way to do this?

回答1:

You are employing a database model called Entity-Attribute-Value. This is a common way to store key/value pairs in a relational database, but it has a number of weaknesses with respect to database normalization and efficiency.

Yes, the table design you showed is the most common way to do it. In this design, every attribute of every entity gets a distinct row in your KeyValue table.

Apply a key/value pair to a group of items: You need to add one row for each item in the group.

INSERT INTO KeyValue (id, key, value) VALUES (101, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (102, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (103, 'color', 'green');

You may also prepare the INSERT statement with parameters and run through a number of item id's in a loop, or whatever.

Enumerate all of the currently-active keys:

SELECT DISTINCT Key FROM KeyValue;

Determine all of the items that have a value for a given key:

SELECT id FROM KeyValue WHERE Key = 'color';

Determine all of the items where the value associated with a given key matches some criteria:

SELECT id FROM KeyValue WHERE Value = 'green';

Some of the problems with Entity-Attribute-Value are:

No way to make sure keys are spelled the same for all items
No way to make some keys mandatory for all items (i.e. NOT NULL in a conventional table design).
All keys must use VARCHAR for the value; can't store different data types per key.
No way to use referential integrity; can't make a FOREIGN KEY that applies to values of some keys and not others.

Basically, Entity-Attribute-Value is not a normalized database design.

回答2:

Don't optimize this unless you have to. What is the average length of a key? Will this table be so big it won't all fit into your server's memory if you implement it the naive way? I'd suggest implementing it the simplest way, measure the performance, and then re-implement only if performance is a problem.

If performance is a problem, then using an integer key and a separate table is probably the way to go (JOINS on integer columns are typically faster than JOINS using variable-length-string columns). But the first rule of optimizing is MEASURE FIRST-- make sure your supposedly-optimized code actually does make thing run faster.

回答3:

An option that may be worth exploring is digesting the key using SHA1 or MD5 before inserting it into the table.

That will allow you to get rid of the lookup table, but you will not be able to iterate through the keys cause it only goes one way.

回答4:

Create updatable views! . Also check this for an example.

回答5:

It seems to me like you might have a couple design choices.

Choice 1: A two table design you hinted at in your answer

Keys (
 id int not null auto_increment
 key string/int
)
values (
 id int not null auto_increment
 key_id int
 value string/varchar/int
)

Choice 2: perhaps as sambo99 pointed out you could modify this:

keys (
 id int not null auto_increment
 key string/int
 hash_code int -- this would be computed by the inserting code, so that lookups would effectively have the id, and you can look them up directly
)

values (
 id int not null auto_increment -- this column might be nice since your hash_codes might colide, and this will make deletes/updates easier
 key_id int -- this column becomes optional
 hash_code int
 value string/varchar/int...
)

回答6:

Key value pair is generally not a good use of relational databases. the benefits of relational databases are the constraints, validation and structure that goes with it. By using a generic key-value structure in your table you are losing the validation and constraints that make relational databases good. If you want the flexible design of key value pairs, you would be best served by a NoSQL database like MongoDB or its ilk.

Key value pair (e.g. NoSQL databases) works best when the underlying data is unstructured, unpredictable, or changing often. If you don't have structured data, a relational database is going to be more trouble than its worth because you will need to make lots of schema changes and/or jump through hoops to conform your data to the ever-changing structure.

KVP / JSON / NoSql is great because changes to the data structure do not require completely refactoring the data model. Adding a field to your data object is simply a matter of adding it to the data. The other side of the coin is there are fewer constraints and validation checks in a KVP / Nosql database than a relational database so your data might get messy.

There are performance and space saving benefits for relational data models. Normalized relational data can make understanding and validating the data easier because there are table key relationships and constraints to help you out. This will make your application easier to maintain and support in the long term. Another approach is to use a data abstraction layer in your code, like Django or SQL Alchemy for Python, Entity Framework for .NET. That way as your code changes your database will change with it automatically.

One of the worst patterns i've seen is trying to have it both ways. Trying to put a key-value pair into a relational database is often a recipe for disaster. I would recommend using the technology that suits your data foremost.

来源：https://stackoverflow.com/questions/514603/key-value-pairs-in-a-database-table

标签

sql

sql-server

tsql

entity-attribute-value