How to design a database schema to support tagging with categories?

懵懂的女人 提交于 2019-11-28 15:26:03

This is yet another variation on the Entity-Attribute-Value design.

A more recognizable EAV table looks like the following:

CREATE TABLE vehicleEAV (
  vid        INTEGER,
  attr_name  VARCHAR(20),
  attr_value VARCHAR(100),
  PRIMARY KEY (vid, attr_name),
  FOREIGN KEY (vid) REFERENCES vehicles (vid)
);

Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.

What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:

CREATE TABLE vehicleTag (
  vid         INTEGER,
  cid         INTEGER,
  tid         INTEGER,
  PRIMARY KEY (vid, cid),
  FOREIGN KEY (vid) REFERENCES vehicles(vid),
  FOREIGN KEY (cid) REFERENCES categories(cid),
  FOREIGN KEY (tid) REFERENCES tags(tid)
);

CREATE TABLE categories (
  cid        INTEGER PRIMARY KEY,
  category   VARCHAR(20) -- "attr_name"
);

CREATE TABLE tags (
  tid        INTEGER PRIMARY KEY,
  tag        VARCHAR(100) -- "attr_value"
);

If you're going to use the EAV design, you only need the vehicleTags and categories tables.

CREATE TABLE vehicleTag (
  vid         INTEGER,
  cid         INTEGER,     -- reference to "attr_name" lookup table
  tag         VARCHAR(100, -- "attr_value"
  PRIMARY KEY (vid, cid),
  FOREIGN KEY (vid) REFERENCES vehicles(vid),
  FOREIGN KEY (cid) REFERENCES categories(cid)
);

But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.

  • How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
  • How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
  • How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.

SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.

Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is Sesame.

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.

So taking from your example the vehicle table would be something like:

vehicle
  vid
  vin
  make
  model

One way would be to slightly rethink your schema, normalising tag keys away from values:

vehicles
  int vid
  string vin

tags
  int tid
  int cid
  string key

categories
  int cid
  string category

vehicleTags
  int vid
  int tid
  string value

Now all you need is a unique constraint on vehicleTags(vid, tid).

Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!