Sql: choose all baskets containing a set of particular items

自古美人都是妖i 提交于 2020-01-16 01:36:09

问题


Eddy has baskets with items. Each item can belong to arbitrary number of baskets or can belong to none of them.

Sql schema to represent it is as following:

tbl_basket
- basketId

tbl_item
- itemId

tbl_basket_item
- pkId
- basketId
- itemId

Question: how to select all baskets containing a particular set of items?

UPDATE. Baskets with all the items are needed. Otherwise it would have been easy task to solve.

UPDATE B. Have implemented following solution, including SQL generation in PHP:

SELECT basketId
FROM   tbl_basket
JOIN   (SELECT basketId FROM tbl_basket_item WHERE itemId = 1  ) AS t0 USING(basketId)
JOIN   (SELECT basketId FROM tbl_basket_item WHERE itemId = 15 ) AS t1 USING(basketId)
JOIN   (SELECT basketId FROM tbl_basket_item WHERE itemId = 488) AS t2 USING(basketId)

where number of JOINs equals to number of items.

That works good unless some of the items are included in almost every basket. Then performance drops dramatically.

UPDATE B+. To resolve performance issues heuristic is applied. First you select frequency of each item. If it exceeds some threshold, you don't include it in JOINs and either:

  • apply post-filtering in PHP
  • or just don't apply filter by particular itemId, giving a user approximate results in a resonable amount of time

UPDATE B++. Seems that current problem have no nice solution in MySQL. This point raises one question and one solution:

  • (question) Does PostgreSQL have some advanced indexing techniques which allows to solve this problem without doing a full scan?
  • (solution) Seems that it could be solved nicely in Redis using sets and SINTER command to get an intersection.

回答1:


I think the best way is to create a temporary table with the set of needed items (procedure that takes the item ids as parameters or something along those lines) and then left join it with all of the above tables joined together.

If for a given basketid you have NO nulls on the right side of the left join, the basket contains all the needed items.




回答2:


-- the table definitions
CREATE TABLE basket ( basketid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE item ( itemid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE basket_item
        ( basketid INTEGER NOT NULL REFERENCES basket (basketid)
        , itemid INTEGER NOT NULL REFERENCES item (itemid)
        , PRIMARY KEY (basketid, itemid)
        );

-- the query
SELECT * FROM basket b
WHERE NOT EXISTS (
        SELECT * FROM item i
        WHERE i.itemid IN (1,15,488)
        AND NOT EXISTS (
                SELECT * FROM basket_item bi
                WHERE bi.basketid = b.basketid
                AND bi.itemid = i.itemid
                )
        );



回答3:


If you are going to provide the list of items, then edit id1, id2, etc. in below query:

select distinct t.basketId
from tbl_basket_item as t
where t.itemID in (id1, id2)

will give all baskets containing a set of items. No need to join any other tables as your requirements don't need them.




回答4:


The simplest solution is to use HAVING clause.

SELECT basketId
FROM   tbl_basket
WHERE itemId IN (1,15,488)
HAVING Count(DISTINCT itemId) = 3 --DISTINCT in case we have duplicate items in a basket
GROUP BY basketId


来源:https://stackoverflow.com/questions/32391831/sql-choose-all-baskets-containing-a-set-of-particular-items

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!