MySQL - Select data with a JOIN but with WHERE clauses applying to complex and distinct perimeters

点点圈 提交于 2019-12-13 08:04:08

问题


I have a MySQL 5.6 database and fail to make a complex query, which (i think) involve multiple joins.

The first table (T1) is Hobby_ideas_articles and records looks like this:

hobby_idea_article_id= 1,
hobby_id = 6
url= 'http://exo.example.com',
author = 'john@example.com'

hobby_idea_article_id= 2,
hobby_id = 3
url= 'http://exo.example2.com',
author = 'john@example.com'

hobby_idea_article_id= 3,
hobby_id = 6
url= 'http://exo.example3.com',
author = 'eric@example.com'

hobby_idea_article_id= 4,
hobby_id = 7
url= 'http://exo.example4.com',
author = 'john@example.com'

hobby_idea_article_id= 5,
hobby_id = 6
url= 'http://exo.example5.com',
author = 'nestor@example.com'

hobby_idea_article_id= 7,
hobby_id = 6
url= 'http://exo.example6.com',
author = 'ginger@example.com'

hobby_idea_article_id= 8,
hobby_id = 6
url= 'http://exo.example8.com',
author = 'derek@example.com'

hobby_idea_article_id= 9,
hobby_id = 6
url= 'http://exo.example9.com',
author = 'derek@example.com'

The second table (T2) called Past_Customer_sent_messages lists ALL messages from the past, notably with the information of who sent it (customer_id -> pointing to another table called Customers which is not needed in this current question) and to whom (recipient), and records look like this

past_customer_sent_message_id = 5
hobby_id = 7,
customer_id = 4,
recipient = "john@example.com",
sent_at= "2019-09-10 00:00:00"

past_customer_sent_message_id = 6
hobby_id = 999,
customer_id = 4,
recipient = "eric@example.com",
sent_at= "2019-09-18 00:00:00"

past_customer_sent_message_id = 7
hobby_id = 999,
customer_id = 4,
recipient = "nestor@example.com",
sent_at= "2019-07-18 00:00:00"

Even if there is currently no foreign key between T1 and T2, one critical information is that the T1 'author' is the 'recipient" that will appear on T2 if a customer sends a message to him.

I'm not sure if it matters for the present question, but there is a Customer table (a past_customer_sent_messages belongs to a customer, a customer has multiple past_customer_sent_messages) and a Hobbies table (a Hobby_ideas_article belongs to a hobby but a hobby can have multiple Hobby_ideas_articles).

In my javascript file, before writing the SQL statement, I have 2 inputs with already given values I'll need to use at some point in the SQL statement:

  • hobby_id = HOBBY_ID_INPUT_I_HAVE , for example 6

  • customer_id = CUSTOMER_ID_INPUT_I_HAVE, for example 4

What I want is to retrieve ALL the records from the Hobby_ideas_articles table (T1) where hobby_id = HOBBY_ID_INPUT_I_HAVE with 2 conditions/restrictions:

  • condition1: the recipient (ex: 'john@example.com'), no matter WHAT (i.e no matter the hobby_id-S = plural it's connected to via 'author' column on T1, and no matter the customer_id-S = plural it's connected to via 'recipient' on T2), has never been sent an message (sent_at column on T2) in the past 3 hours. Again, this applies across all past messages to this 'recipient' (very important it means not only on the T1 'authors' that would match hobby_id = HOBBY_ID_INPUT_I_HAVE = 6).

  • condition2: the author has not been messaged in the past 30 days by the customer who has customer_id = CUSTOMER_ID_INPUT_I_HAVE. Here it applies to this specific customer (but across any hobby!).

A part I find particularly challenging intellectually as I would be tempted to put as a WHERE hobby_id = HOBBY_ID_INPUT_I_HAVE (see below) but it feels wrong because then, how to do I implement the condition1 which applies to ALL the messages, and should not be applied to only those with hobby_id = HOBBY_ID_INPUT_I_HAVE...hard! Same kind of difficulty for condition2

Maybe I could/should add a new column called hobby_idea_id that I would put on T2 which would be a foreign key between the 2 tables as a Past_Customer_sent_message belongs in a way to a Hobby_ideas_article and a Hobby_ideas_article has many Past_Customer_sent_messages ?

Today I have this but I don't even know on what to put the 'ON' clause of the join...and can't set the correct statement.

SELECT             
          hia.query_result_id,
          hia.url,
          hia.title,
          hia.article_publication_day,
          hia.media_name,
          hia.author_full_name,
          hia.author_first_name,
          hia.author_last_name,
          hia.author_email,
          ces.sent_at
FROM
          Hobby_ideas_articles hia
LEFT JOIN
          Past_Customer_sent_messages ces
ON
          # WHAT CLAUSE ???
WHERE
          hia.hobby_id = HOBBY_ID_INPUT_I_HAVE AND         
          hia.author IS NOT NULL
          # more conditions but how to assert them ??

How to achieve this ?

The desired output would be:

hobby_idea_article_id= 5,
hobby_id = 6
url= 'http://exo.example5.com',
author = 'nestor@example.com'
last_email_sent_at = "2019-07-18 00:00:00"

hobby_idea_article_id= 7,
hobby_id = 6
url= 'http://exo.example6.com',
author = 'ginger@example.com'
last_email_sent_at = null

hobby_idea_article_id= 8,
hobby_id = 6
url= 'http://exo.example8.com',
author = 'derek@example.com'
last_email_sent_at = null

As you can see with my inputs values given as examples (hobby_id = 6 , customer_id = 4),

  • 'john@example.com' and any hobby_ideas related to him are not part of the output because john has been messaged by customer_id = 4 in the past 30 days already

  • 'Eric' and the hobby_ideas related to him could have been part of the output because Eric was messaged by customer_id=6 8 days ago (and not customer_id=4 which is the value we target), BUT unfortunately it was excluded/filtered out because of condition1 : he was messaged at some point 2 hours ago by another customer, so he's out too.

  • nestor and the hobby_idea related to him are inside the output because he's unaffected by both because he indeed is related to hobby_id= 6, but his only mail was sent a few months ago.

  • ginger and the hobby_idea related to her are inside the output because condition1 and2 are ok given that ginger never emailed anybody: ther's no message at all where she is the recipient

  • Also note that if an author would be in the output but has multiple hobby_ideas related to him, then i only want one, that's why derek only appear once with let's say the FIRST hobby_idea but the second one he's one is removed from the output

Disclaimer: I come from a Rails background where I used to have ORM (Active Record) making all the chains/joins/ easier more automagic and am a bit lost here with the present SQL statement I must write inside my javascript aws lambda file.


回答1:


Use recipient NOT IN (SELECT ...) and author NOT IN (SELECT ...) to exclude the participants that match your conditions 1 and 2.

SELECT             
          hia.query_result_id,
          hia.url,
          hia.title,
          hia.article_publication_day,
          hia.media_name,
          hia.author_full_name,
          hia.author_first_name,
          hia.author_last_name,
          hia.author_email,
          ces.sent_at
FROM
          Hobby_ideas_articles hia
LEFT JOIN
          Past_Customer_sent_messages ces
ON
          hia.author = ces.recipient 
            AND ces.recipient NOT IN (
                SELECT recipient
                FROM Past_Customer_sent_messages
                WHERE sent_at > DATE_SUB(NOW(), INTERVAL 3 HOUR)
            )
WHERE
          hia.hobby_id = HOBBY_ID_INPUT_I_HAVE AND         
          hia.author IS NOT NULL
          AND hia.author NOT IN (
            SELECT recipient
            FROM Past_Customer_sent_messages
            WHERE customer_id = CUSTOMER_ID_INPUT_I_HAVE
            AND sent_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
          )

The condition on ces has to be in the ON clause so that you'll get sent_at = NULL for articles that don't have any matching rows.



来源:https://stackoverflow.com/questions/58001908/mysql-select-data-with-a-join-but-with-where-clauses-applying-to-complex-and-d

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!