问题
I am going to join three tables then sum one of the column multiplied with an value from another.
SELECT t1.column, t2.column, SUM(t1.column * t2.column)
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.id
JOIN table3 t3
ON t2.id = t3.id
GROUP BY t1.column, t2.column;
This query does what I want, BUT I do not understand why the GROUP BY
works?
If I add columns to the select must I also add columns to the group by?
回答1:
Do you actually know what you are doing here?
SELECT t1.column, t2.column, SUM(t1.column * t2.column)
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id
JOIN table3 t3 ON t2.id = t3.id
GROUP BY t1.column, t2.column;
The query is very suspicious in at least 2 ways:
Table3 is not used - except to verify that a record exists in t3 for the id in t2. Did you want that?
Potential pitfall
If there are multiple T3 records per t2 record, you will get a cartesian product, an unintended multiplication of the SUM column.GROUP BY t1.column, t2.column - this combines all the unique combinations of
(t1.column, t2.column)
, and sums the result oft1.column * t2.column
across them. Is this really what you are after?
For point 2, consider this (source) data:
t1.id, t1.column, t2.column, t1.column*t2.column
1 2 3 6
2 2 3 6
3 3 3 9
4 3 4 12
You end up with the output
t1.column, t2.column, SUM(t1.column*t2.column)
2 3 12
3 3 9
3 4 12
See that (2,3) has combined the sum.
If I add columns to the select I also must add columns to the group by.
Columns in the SELECT (with the exception of some DBMS like MySQL) have to be either an aggregate (e.g. sum/avg/min/max) or a column in the GROUP BY clause. There are other expressions you can use like scalar functions or constant values not directly from the tables.
If you actually need more columns from the table related to the aggregates, you need to think clearly about why. e.g. If you are grouping by column1 and averaging column2, what are you tring to do with column3 - which row should it come from?
回答2:
That is because SUM is an aggregating function that is calculated on the result of each group.
回答3:
Don't worry about the JOIN
s at first. To understand GROUP BY
, first look at a very simple query.
SELECT t1.year, t1.person
FROM table t1
This would return
year | person 2000 | Joe 2000 | Betty 2000 | Marty 2001 | Joe 2002 | Betty
If you throw in an aggregate function, you have to include a GROUP BY
for everything not covered by an aggregate function.
SELECT t1.year, COUNT(t1.person) as counter
FROM table t1
GROUP BY t1.year
year | counter 2000 | 3 2001 | 1 2002 | 1
If you don't include the GROUP BY
, it doesn't work because the database literally does not know how you want to group your data.
回答4:
When GROUP BY has more than 1 parameter, like in your case, it means "First sort by definition #1, and if there are multiple definition #1's, then sort by definition #2, if there are multiple definition #2's then group these together.".
回答5:
Columns that are the target of aggregate functions do not have to be part of the GROUP BY
clause. Aggregate functions are functions such as SUM, AVG, MIN, MAX
etc.
回答6:
Because Aggregate functions gives you one return value...
First it'll sort Then use the distinct sort result sets to perform an aggregate operation on that set.
来源:https://stackoverflow.com/questions/5020022/joining-three-table-then-group