Writing a Complex MySQL Query

前端 未结 3 917
误落风尘
误落风尘 2021-01-17 06:26

Note: you can find my previous question and its answer here - MySQL: Writing a complex query


I have 3 tables.

Table Words_Learned contains

相关标签:
3条回答
  • So, I think this is it. You want to get the "best" 100 articles, where "best" means the later a word learnt it contains the better it is. So I look for each article's last learnt word (the max(words_learned.order) per article). Then I show the article IDs in that order and stop at 100.

    select w.idarticle, max(l.`order`)
    from words w
    join words_learned l on l.idwords = w.idwords and l.userid = 123
    group by w.idarticle
    order by max(l.`order`) desc
    limit 100;
    

    You have edited your request. You want to limit the results to articles that contain no more then ten unknown words. In order to do so you must now outer-join the learnt words, so you can count the unknown words (i.e. the outer-joined records). Use HAVING to remove undesired articles from the list.

    select w.idarticle, max(l.`order`)
    from words w
    left join words_learned l on l.idwords = w.idwords and l.iduser = 123
    group by w.idarticle
    having sum(l.idwords is null) <= 10 and max(l.`order`) is not null
    order by max(l.`order`) desc
    limit 100;
    
    0 讨论(0)
  • 2021-01-17 06:49

    I would be tempted to have a sub query that gets all the words a person has learned and join that against itself, with the words GROUP_CONCAT together along with a count. So giving:-

    Octopus, NULL, 0
    Dog, "Octopus", 1
    Spoon, "Octopus,Dog", 2
    

    So the sub query would be something like:-

    SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
    FROM words_learned sub0
    LEFT OUTER JOIN words_learned sub1
    ON sub0.userId = sub1.userId
    AND sub0.order_learned < sub1.order_learned
    WHERE sub0.userId = 1
    GROUP BY sub0.idwords
    

    giving

    idwords    excl_words    older_words_cnt
    1          NULL          0
    2          1             1
    3          1,2           2
    

    Then join the results of this against the other tables, checking for articles where the main idwords matches but none of the others are found.

    Something like this (although not tested as no test data):-

    SELECT sub_words.idwords, words_inc.idArticle
    (
        SELECT sub0.idwords, SUBSTRING_INDEX(GROUP_CONCAT(sub1.idwords), ',', 10) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
        FROM words_learned sub0
        LEFT OUTER JOIN words_learned sub1
        ON sub0.userId = sub1.userId
        AND sub0.order_learned < sub1.order_learned
        WHERE sub0.userId = 1
        GROUP BY sub0.idwords
    ) sub_words
    INNER JOIN words words_inc
    ON sub_words.idwords = words_inc.idwords
    LEFT OUTER JOIN words words_exc
    ON words_inc.idArticle = words_exc.idArticle
    AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
    WHERE words_exc.idwords IS NULL
    ORDER BY older_words_cnt
    LIMIT 100 
    

    EDIT - updated to exclude articles with more than 10 words that are not already learned.

    SELECT sub_words.idwords, words_inc.idArticle,
    sub2.idArticle, sub2.count, sub2.content
    FROM
    (
        SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
        FROM words_learned sub0
        LEFT OUTER JOIN words_learned sub1
        ON sub0.userId = sub1.userId
        AND sub0.order_learned < sub1.order_learned
        WHERE sub0.userId = 1
        GROUP BY sub0.idwords
    ) sub_words 
    INNER JOIN words words_inc
    ON sub_words.idwords = words_inc.idwords
    INNER JOIN
    (
        SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
        FROM Article a
        INNER JOIN words b
        ON a.idArticle = b.idArticle
        LEFT OUTER JOIN words_learned c
        ON b.idwords = c.idwords
        AND c.userId = 1
        GROUP BY a.idArticle, a.count, a.content
        HAVING unlearned_words_count < 10
    ) sub2
    ON words_inc.idArticle = sub2.idArticle
    LEFT OUTER JOIN words words_exc
    ON words_inc.idArticle = words_exc.idArticle
    AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
    WHERE words_exc.idwords IS NULL
    ORDER BY older_words_cnt
    LIMIT 100
    

    EDIT - attempt at commenting the above query:-

    This just selects the columns

    SELECT sub_words.idwords, words_inc.idArticle,
    sub2.idArticle, sub2.count, sub2.content
    FROM
    

    This sub query gets each of the words learnt, along with a comma separated list of the words with a larger order_learned. This is for a particular user id

    (
        SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
        FROM words_learned sub0
        LEFT OUTER JOIN words_learned sub1
        ON sub0.userId = sub1.userId
        AND sub0.order_learned < sub1.order_learned
        WHERE sub0.userId = 1
        GROUP BY sub0.idwords
    ) sub_words 
    

    This is just to get the articles the words (ie, the words learned from the above sub query) are used in

    INNER JOIN words words_inc
    ON sub_words.idwords = words_inc.idwords
    

    This sub query gets the articles which have a less than 10 words in them that are not yet learnt by the particular user.

    INNER JOIN
    (
        SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
        FROM Article a
        INNER JOIN words b
        ON a.idArticle = b.idArticle
        LEFT OUTER JOIN words_learned c
        ON b.idwords = c.idwords
        AND c.userId = 1
        GROUP BY a.idArticle, a.count, a.content
        HAVING unlearned_words_count < 10
    ) sub2
    ON words_inc.idArticle = sub2.idArticle
    

    This join is to find articles that have words in the comma separted list from the 1st sub query (ie words with a larger order_learned). This is done as a LEFT OUTER JOIN as I want to exclude any words that are found (this is done in the WHERE clause by checking for NULL)

    LEFT OUTER JOIN words words_exc
    ON words_inc.idArticle = words_exc.idArticle
    AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
    WHERE words_exc.idwords IS NULL
    ORDER BY older_words_cnt
    LIMIT 100
    
    0 讨论(0)
  • 2021-01-17 06:59

    I've read that question again and notice it is much more complicated.

    First of all you want to show words. And whether you show a word depends on that word and all before-learned words (and the articles they appear in).

    So with these words learned:

    word      order
    Octopus   3
    Dog       2
    Spoon     1 (i.e.first learned)
    

    And these articles:

    article  contains Octopus   contains Dog   contains spoon   unknown words
    A             yes               yes             yes              5
    B             yes               yes             no              11
    C             yes               no              yes             15
    D             no                yes             yes              2
    E             no                yes             no               0
    F             no                no              yes              8
    G             no                no              no               3
    H             no                no              no              20
    

    You ...

    • check "Octupus" and dismiss it because of article B or C.
    • check "Dog" and keep it, because articles D and E are okay and B must be ignored (as it contains "Octopus").
    • check "Spoon" and keep it, article F is okay and C must be ignored (as it contains "Octopus").

    So you show "Dog" and "Spoon" and not "Octopus". And if there were not only two matches, but thousand, you would show the first 100 and then stop.

    Given this algorithm, we can conclude:

    • As long as too few word were learned, no results will be shown at all.
    • At some point enough words will be learned to find articles with less than 11 unknown words. The last-learned words will probably not be shown (as "Octopus" in the example), because there are still many articles with too many unknown words. But earlier words will be shown (because the last-learned words filter away the hard-to-read articles).
    • Then some day most of the words will be learned. Then it is the last-learned words that will be shown.

    The query:

    select idwords
    from words_learned
    where userid = 123
    and not exists
    (
      select w.idarticle
      from words w
      left join words_learned l on l.idwords = w.idwords and l.userid = 123
      group by w.idarticle
      having sum(l.idwords is null) > 10 and max(l.`order`) = words_learned.`order`
    )
    order by `order` desc
    limit 100;
    

    Here is an SQL fiddle: http://sqlfiddle.com/#!2/19bf8/1.

    0 讨论(0)
提交回复
热议问题