Joining three table then group

问题

I am going to join three tables then sum one of the column multiplied with an value from another.

SELECT t1.column, t2.column, SUM(t1.column * t2.column)
FROM table1 t1 
     INNER JOIN table2 t2 
         ON t1.id = t2.id 
     JOIN table3 t3 
         ON t2.id = t3.id 
GROUP BY t1.column, t2.column;

This query does what I want, BUT I do not understand why the GROUP BY works?

If I add columns to the select must I also add columns to the group by?

回答1:

Do you actually know what you are doing here?

SELECT t1.column, t2.column, SUM(t1.column * t2.column)
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id
JOIN table3 t3 ON t2.id = t3.id
GROUP BY t1.column, t2.column;

The query is very suspicious in at least 2 ways:

Table3 is not used - except to verify that a record exists in t3 for the id in t2. Did you want that? Potential pitfall If there are multiple T3 records per t2 record, you will get a cartesian product, an unintended multiplication of the SUM column.
GROUP BY t1.column, t2.column - this combines all the unique combinations of (t1.column, t2.column), and sums the result of t1.column * t2.column across them. Is this really what you are after?

For point 2, consider this (source) data:

t1.id, t1.column, t2.column, t1.column*t2.column
1      2          3          6
2      2          3          6
3      3          3          9
4      3          4          12

You end up with the output

t1.column, t2.column, SUM(t1.column*t2.column)
2          3          12
3          3          9
3          4          12

See that (2,3) has combined the sum.

If I add columns to the select I also must add columns to the group by.

Columns in the SELECT (with the exception of some DBMS like MySQL) have to be either an aggregate (e.g. sum/avg/min/max) or a column in the GROUP BY clause. _{There are other expressions you can use like scalar functions or constant values not directly from the tables.}

If you actually need more columns from the table related to the aggregates, you need to think clearly about why. e.g. If you are grouping by column1 and averaging column2, what are you tring to do with column3 - which row should it come from?

回答2:

That is because SUM is an aggregating function that is calculated on the result of each group.

回答3:

Don't worry about the JOINs at first. To understand GROUP BY, first look at a very simple query.

SELECT t1.year, t1.person
FROM table t1

This would return

year | person
2000 | Joe
2000 | Betty
2000 | Marty
2001 | Joe
2002 | Betty

If you throw in an aggregate function, you have to include a GROUP BY for everything not covered by an aggregate function.

SELECT t1.year, COUNT(t1.person) as counter
FROM table t1
GROUP BY t1.year

year | counter
2000 | 3
2001 | 1
2002 | 1

If you don't include the GROUP BY, it doesn't work because the database literally does not know how you want to group your data.

回答4:

When GROUP BY has more than 1 parameter, like in your case, it means "First sort by definition #1, and if there are multiple definition #1's, then sort by definition #2, if there are multiple definition #2's then group these together.".

回答5:

Columns that are the target of aggregate functions do not have to be part of the GROUP BY clause. Aggregate functions are functions such as SUM, AVG, MIN, MAX etc.

回答6:

Because Aggregate functions gives you one return value...

First it'll sort Then use the distinct sort result sets to perform an aggregate operation on that set.

来源：https://stackoverflow.com/questions/5020022/joining-three-table-then-group

标签

sql

database

join

group-by