问题
Eddy has baskets with items. Each item can belong to arbitrary number of baskets or can belong to none of them.
Sql schema to represent it is as following:
tbl_basket
- basketId
tbl_item
- itemId
tbl_basket_item
- pkId
- basketId
- itemId
Question: how to select all baskets containing a particular set of items?
UPDATE. Baskets with all the items are needed. Otherwise it would have been easy task to solve.
UPDATE B. Have implemented following solution, including SQL generation in PHP:
SELECT basketId
FROM tbl_basket
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 1 ) AS t0 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 15 ) AS t1 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 488) AS t2 USING(basketId)
where number of JOINs equals to number of items.
That works good unless some of the items are included in almost every basket. Then performance drops dramatically.
UPDATE B+. To resolve performance issues heuristic is applied. First you select frequency of each item. If it exceeds some threshold, you don't include it in JOINs and either:
- apply post-filtering in PHP
- or just don't apply filter by particular itemId, giving a user approximate results in a resonable amount of time
UPDATE B++. Seems that current problem have no nice solution in MySQL. This point raises one question and one solution:
- (question) Does PostgreSQL have some advanced indexing techniques which allows to solve this problem without doing a full scan?
- (solution) Seems that it could be solved nicely in Redis using sets and SINTER command to get an intersection.
回答1:
I think the best way is to create a temporary table with the set of needed items (procedure that takes the item ids as parameters or something along those lines) and then left join it with all of the above tables joined together.
If for a given basketid you have NO nulls on the right side of the left join, the basket contains all the needed items.
回答2:
-- the table definitions
CREATE TABLE basket ( basketid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE item ( itemid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE basket_item
( basketid INTEGER NOT NULL REFERENCES basket (basketid)
, itemid INTEGER NOT NULL REFERENCES item (itemid)
, PRIMARY KEY (basketid, itemid)
);
-- the query
SELECT * FROM basket b
WHERE NOT EXISTS (
SELECT * FROM item i
WHERE i.itemid IN (1,15,488)
AND NOT EXISTS (
SELECT * FROM basket_item bi
WHERE bi.basketid = b.basketid
AND bi.itemid = i.itemid
)
);
回答3:
If you are going to provide the list of items, then edit id1, id2, etc. in below query:
select distinct t.basketId
from tbl_basket_item as t
where t.itemID in (id1, id2)
will give all baskets containing a set of items. No need to join any other tables as your requirements don't need them.
回答4:
The simplest solution is to use HAVING
clause.
SELECT basketId
FROM tbl_basket
WHERE itemId IN (1,15,488)
HAVING Count(DISTINCT itemId) = 3 --DISTINCT in case we have duplicate items in a basket
GROUP BY basketId
来源:https://stackoverflow.com/questions/32391831/sql-choose-all-baskets-containing-a-set-of-particular-items