I\'m sort of breaking my head over the Toxi solution for tag database schemas. I\'m working on a system to which users can submit items, and those items can have tags associated
First of all "toxi" is not a standard term. Always define your terms! Or at least provide relevant links.
And now to the question itself...
I'll have three databases.
No, you'll have 3 tables.
When adding a new item...
You are pretty much on the right track, with the exception that you can use the set-based nature of SQL to "merge" many of these steps. For example, tagging an item 1 with tags: 'tag1', 'tag2' and 'tag3' can be done like this...
INSERT IGNORE INTO tagmap (item_id, tag_id)
SELECT 1, tag_id FROM tags WHERE tag_text IN ('tag1', 'tag2', 'tag3');
The IGNORE
allows this to succeed even if item is already connected to some of these tags.
This assumes all required tags are already in tags
. Assuming tag.tag_id
is auto-increment, you can do something like this to ensure they are:
INSERT IGNORE INTO tags (tag_text) VALUES ('tag1'), ('tag2'), ('tag3');
This means we'll end up with an entry in the tagmap for every tag for every item. It seems correct, but I can't help but think there's a better way to do that then ending up with a huge amount of entries there...
There is no magic. If "item is connected to a particular tag" is piece of knowledge you want to record, then it will have to have some sort of physical representation in the database.
As for editing the tags...
You mean re-tagging items (not modifying tags themselves)?
To remove all tags that are not in the list, do something like this:
DELETE FROM tagmap
WHERE
item_id = 1
AND tag_id NOT IN (
SELECT tag_id FROM tags
WHERE tag_text IN ('tag1', 'tag3')
);
This will disconnect the item from all tags except 'tag1' and 'tag3'. Execute the INSERT above and this DELETE one after another to "cover" both adding and removing tags.
You can play with all this in the SQL Fiddle.
And just to be sure: when deleting tagmap rows, the related items won't be deleted with it because it points to a foreign key instead of acting as one, right?
Correct. A child endpoint of a FK will not trigger a referential action (such as ON DELETE CASCADE), only parent will.
BTW, you are using this schema because you want additional fields in tags
(beside tag_text
), right? If you do, not loosing this additional data just because all connections are gone is desired behavior.
But if you just wanted the tag_text
, you'd use a simpler schema where deleting all connections would be the same as deleting the tag itself:
This would not just simplify the SQL, it would also provide better clustering.
At first glance, "toxi" might look like it's saving space, but this might actually not be the case in practice, since it requires additional tables and indexes (and tags tend to be short).
Also, I may want to keep track of the amount of times a tag ... cron job ...
Measure before you decide to do something like this. My SQL Fiddle mentioned above uses a very deliberate order of fields in the tagmap
PK, so data is clustered in a way very friendly to this kind of counting (remember: InnoDB tables are clustered). You'd have to have a truly huge amount of items (or require unusually high performance) before this becomes a problem.
In any case, measure on realistic amounts of data!