Select at least one from each category?

后端 未结 3 1126
闹比i
闹比i 2021-02-09 07:29

SQLFiddle Link

I\'ve got an SQLite database with a bunch of test/exam questions. Each question belongs to one question category.

3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-09 07:56

    The key to the answer is that there are two kinds of questions in the result: for each category, one question that must be constrained to come from that category; and some remaining questions.

    First, the constrained questions: we just select one record from each category:

    SELECT id, category_id, question_text, 1 AS constrained, max(random()) AS r
    FROM so_questions
    GROUP BY category_id
    

    (This query relies on a feature introduced in SQLite 3.7.11 (in Jelly Bean or later): in a query SELECT a, max(b), the value of a is guaranteed to come from the record that has the maximum b value.)

    We also have to get the non-constrained questions (filtering out the duplicates that are already in the constrained set will happen in the next step):

    SELECT id, category_id, question_text, 0 AS constrained, random() AS r
    FROM so_questions
    

    When we combine these two queries with UNION and then group by the id, we have all the duplicates together. Selecting max(constrained) then ensures that for the groups that have duplicates, only the constrained question remains (while all the other questions have only one record per group anyway).

    Finally, the ORDER BY clause ensures that the constrained questions come first, followed by some random other questions:

    SELECT *, max(constrained)
    FROM (SELECT id, category_id, question_text, 1 AS constrained, max(random()) AS r
          FROM so_questions
          GROUP BY category_id
          UNION ALL
          SELECT id, category_id, question_text, 0 AS constrained, random() AS r
          FROM so_questions)
    GROUP BY id
    ORDER BY constrained DESC, r
    LIMIT 5
    

    For earlier SQLite/Android versions, I haven't found a solution without using a temporary table (because the subquery for the constrained question must be used multiple times, but does not stay constant because of the random()):

    BEGIN TRANSACTION;
    
    CREATE TEMPORARY TABLE constrained AS
    SELECT (SELECT id
            FROM so_questions
            WHERE category_id = cats.category_id
            ORDER BY random()
            LIMIT 1) AS id
    FROM (SELECT DISTINCT category_id
          FROM so_questions) AS cats;
    
    SELECT ids.id, category_id, question_text
    FROM (SELECT id
          FROM (SELECT id, 1 AS c
                FROM constrained
                UNION ALL
                SELECT id, 0 AS c
                FROM so_questions
                WHERE id NOT IN (SELECT id FROM constrained))
          ORDER BY c DESC, random()
          LIMIT 5) AS ids
    JOIN so_questions ON ids.id = so_questions.id;
    
    DROP TABLE constrained;
    COMMIT TRANSACTION;
    

提交回复
热议问题