问题
I want to write an SQL query calling for several columns with a bit complicated conditions. I'm working on R Studio using RMySQL package. My server is MySQL.
The table looks like this.
organisation Tour_ID A B C D
Ikea a 2018-04-01 2018-05-07 2018-05-09 2018-05-01
Ikea a 2018-06-01 2018-05-03 2018-05-29 NA
Ikea a 2018-04-02 2018-05-01 2018-07-08 2018-05-26
Ikea b 2018-06-02 2018-05-01 NA 2018-05-26
Ikea b 2018-06-02 2018-05-01 NA 2018-05-26
Ikea b NA 2018-05-05 2018-08-02 2018-06-01
Ikea c 2018-06-01 2018-05-07 2018-05-09 2018-05-01
Ikea c 2018-06-01 2018-05-03 NA NA
Ikea c 2018-08-02 2018-05-09 2018-07-08 2018-05-26
This is what I want to do:
- filter the rows where
organisation = Ikea
groupby by
Tour_ID
like this:organisation Tour_ID A B C D Ikea a 2018-04-01 2018-05-07 2018-05-09 2018-05-01 Ikea a 2018-06-01 2018-05-03 2018-05-29 NA Ikea a 2018-04-02 2018-05-01 2018-07-08 2018-05-26 Ikea b 2018-06-02 2018-05-01 NA 2018-05-26 Ikea b 2018-06-02 2018-05-01 NA 2018-05-26 Ikea b NA 2018-05-05 2018-08-02 2018-06-01 Ikea c 2018-06-01 2018-05-07 2018-05-09 2018-05-01 Ikea c 2018-06-01 2018-05-03 NA NA Ikea c 2018-08-02 2018-05-09 2018-07-08 2018-05-26
in each group of
Tour_ID
, look at the earliest date in columnsA
,B
,C
andD
. If the earliest date among the four columns in the group is between2018-05-01
and2018-05-31
, return the entire group. If a row containsNA
values, I want to ignore theNA
s and see what's the earliest date among the rest of the values. For example, for the group ofTour_ID = a
, the earliest date is2018-04-01
therefore it doesn't meet the criteria.
In conclusion, only the groups where Tour_ID = b
and Tour_ID = c
match the conditions. The result should be:
organisation Tour_ID A B C D
Ikea b 2018-06-02 2018-05-01 NA 2018-05-26
Ikea b 2018-06-02 2018-05-01 NA 2018-05-26
Ikea b NA 2018-05-05 2018-08-02 2018-06-01
Ikea c 2018-06-01 2018-05-07 2018-05-09 2018-05-01
Ikea c 2018-06-01 2018-05-03 NA NA
Ikea c 2018-08-02 2018-05-09 2018-07-08 2018-05-26
How should I write an SQL query? Here is my attempt, but I just don't know how to do groupby, and how to return the entire group not just the rows with the earliest date.
SELECT *
FROM myTable
WHERE organisation LIKE 'Ikea' AND
GROUP BY 'Tour_ID' AND
LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) >= '2018-05-01' AND
LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) < '2018-06-01';
('2019-01-01' is to replace NAs)
Thank you for any kinds of help!
ADDED: Following the answer by Gordon, here I rewrote the SQL statement.
"SELECT t.* FROM myTable JOIN (SELECT organisation, Tour_ID
FROM myTable
WHERE organisation LIKE 'Ikea' AND
GROUP BY organisation, Tour_ID
HAVING LEAST(COALESCE(MIN(A), '2119-01-01'),
COALESCE(MIN(B), '2119-01-01'),
COALESCE(MIN(C), '2119-01-01'),
COALESCE(MIN(D), '2119-01-01')) >= '2018-05-01' AND
LEAST(COALESCE(MIN(A), '2119-01-01'),
COALESCE(MIN(B), '2119-01-01'),
COALESCE(MIN(C), '2119-01-01'),
COALESCE(MIN(D), '2119-01-01')) < '2018-06-01'
) tt
ON tt.Tour_ID = t.Tour_ID AND
tt.organisation = t.organisation"
And I ran dbGetQuery
from RMySQL package. But I get the following error. I don't understand because GROUP BY
part seems quite okay. Does anyone know why I'm getting this error?
dbGetQuery(connection = connection, statement = condition)
Error in .local(conn, statement, ...) : could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'GROUP BY organisation, Tour_ID HAVING LEAST(COALESCE(A' at line 1
回答1:
First get the tour_id
s that match the conditions:
SELECT Tour_ID
FROM myTable
WHERE organisation LIKE 'Ikea'
GROUP BY Tour_ID
HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01';
Then put this into a query to get the original rows. Here is one way:
select t.*
from mytable t join
(SELECT organisation, Tour_ID
FROM myTable
WHERE organisation LIKE 'Ikea'
GROUP BY organisation, Tour_ID
HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01'
) tt
ON tt.tour_id = t.tour_id AND
tt.organisation = t.organisation;
来源:https://stackoverflow.com/questions/51079281/sql-query-for-group-by-return-groups-that-match-the-conditions-of-least-coales