Note: you can find my previous question and its answer here - MySQL: Writing a complex query
I have 3 tables.
Table Words_Learned
contains
So, I think this is it. You want to get the "best" 100 articles, where "best" means the later a word learnt it contains the better it is. So I look for each article's last learnt word (the max(words_learned.order) per article). Then I show the article IDs in that order and stop at 100.
select w.idarticle, max(l.`order`)
from words w
join words_learned l on l.idwords = w.idwords and l.userid = 123
group by w.idarticle
order by max(l.`order`) desc
limit 100;
You have edited your request. You want to limit the results to articles that contain no more then ten unknown words. In order to do so you must now outer-join the learnt words, so you can count the unknown words (i.e. the outer-joined records). Use HAVING to remove undesired articles from the list.
select w.idarticle, max(l.`order`)
from words w
left join words_learned l on l.idwords = w.idwords and l.iduser = 123
group by w.idarticle
having sum(l.idwords is null) <= 10 and max(l.`order`) is not null
order by max(l.`order`) desc
limit 100;
I would be tempted to have a sub query that gets all the words a person has learned and join that against itself, with the words GROUP_CONCAT together along with a count. So giving:-
Octopus, NULL, 0
Dog, "Octopus", 1
Spoon, "Octopus,Dog", 2
So the sub query would be something like:-
SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
FROM words_learned sub0
LEFT OUTER JOIN words_learned sub1
ON sub0.userId = sub1.userId
AND sub0.order_learned < sub1.order_learned
WHERE sub0.userId = 1
GROUP BY sub0.idwords
giving
idwords excl_words older_words_cnt
1 NULL 0
2 1 1
3 1,2 2
Then join the results of this against the other tables, checking for articles where the main idwords matches but none of the others are found.
Something like this (although not tested as no test data):-
SELECT sub_words.idwords, words_inc.idArticle
(
SELECT sub0.idwords, SUBSTRING_INDEX(GROUP_CONCAT(sub1.idwords), ',', 10) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
FROM words_learned sub0
LEFT OUTER JOIN words_learned sub1
ON sub0.userId = sub1.userId
AND sub0.order_learned < sub1.order_learned
WHERE sub0.userId = 1
GROUP BY sub0.idwords
) sub_words
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100
EDIT - updated to exclude articles with more than 10 words that are not already learned.
SELECT sub_words.idwords, words_inc.idArticle,
sub2.idArticle, sub2.count, sub2.content
FROM
(
SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
FROM words_learned sub0
LEFT OUTER JOIN words_learned sub1
ON sub0.userId = sub1.userId
AND sub0.order_learned < sub1.order_learned
WHERE sub0.userId = 1
GROUP BY sub0.idwords
) sub_words
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
INNER JOIN
(
SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
FROM Article a
INNER JOIN words b
ON a.idArticle = b.idArticle
LEFT OUTER JOIN words_learned c
ON b.idwords = c.idwords
AND c.userId = 1
GROUP BY a.idArticle, a.count, a.content
HAVING unlearned_words_count < 10
) sub2
ON words_inc.idArticle = sub2.idArticle
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100
EDIT - attempt at commenting the above query:-
This just selects the columns
SELECT sub_words.idwords, words_inc.idArticle,
sub2.idArticle, sub2.count, sub2.content
FROM
This sub query gets each of the words learnt, along with a comma separated list of the words with a larger order_learned. This is for a particular user id
(
SELECT sub0.idwords, GROUP_CONCAT(sub1.idwords) AS excl_words, COUNT(sub1.idwords) AS older_words_cnt
FROM words_learned sub0
LEFT OUTER JOIN words_learned sub1
ON sub0.userId = sub1.userId
AND sub0.order_learned < sub1.order_learned
WHERE sub0.userId = 1
GROUP BY sub0.idwords
) sub_words
This is just to get the articles the words (ie, the words learned from the above sub query) are used in
INNER JOIN words words_inc
ON sub_words.idwords = words_inc.idwords
This sub query gets the articles which have a less than 10 words in them that are not yet learnt by the particular user.
INNER JOIN
(
SELECT a.idArticle, a.count, a.content, SUM(IF(c.idwords_learned IS NULL, 1, 0)) AS unlearned_words_count
FROM Article a
INNER JOIN words b
ON a.idArticle = b.idArticle
LEFT OUTER JOIN words_learned c
ON b.idwords = c.idwords
AND c.userId = 1
GROUP BY a.idArticle, a.count, a.content
HAVING unlearned_words_count < 10
) sub2
ON words_inc.idArticle = sub2.idArticle
This join is to find articles that have words in the comma separted list from the 1st sub query (ie words with a larger order_learned). This is done as a LEFT OUTER JOIN as I want to exclude any words that are found (this is done in the WHERE clause by checking for NULL)
LEFT OUTER JOIN words words_exc
ON words_inc.idArticle = words_exc.idArticle
AND FIND_IN_SET(words_exc.idwords, sub_words.excl_words)
WHERE words_exc.idwords IS NULL
ORDER BY older_words_cnt
LIMIT 100
I've read that question again and notice it is much more complicated.
First of all you want to show words. And whether you show a word depends on that word and all before-learned words (and the articles they appear in).
So with these words learned:
word order Octopus 3 Dog 2 Spoon 1 (i.e.first learned)
And these articles:
article contains Octopus contains Dog contains spoon unknown words A yes yes yes 5 B yes yes no 11 C yes no yes 15 D no yes yes 2 E no yes no 0 F no no yes 8 G no no no 3 H no no no 20
You ...
So you show "Dog" and "Spoon" and not "Octopus". And if there were not only two matches, but thousand, you would show the first 100 and then stop.
Given this algorithm, we can conclude:
The query:
select idwords
from words_learned
where userid = 123
and not exists
(
select w.idarticle
from words w
left join words_learned l on l.idwords = w.idwords and l.userid = 123
group by w.idarticle
having sum(l.idwords is null) > 10 and max(l.`order`) = words_learned.`order`
)
order by `order` desc
limit 100;
Here is an SQL fiddle: http://sqlfiddle.com/#!2/19bf8/1.