MySQL order by before group by

后端 未结 9 2234
梦谈多话
梦谈多话 2020-11-22 06:57

There are plenty of similar questions to be found on here but I don\'t think that any answer the question adequately.

I\'ll continue from the current most popular qu

相关标签:
9条回答
  • 2020-11-22 07:40

    ** Sub queries may have a bad impact on performance when used with large datasets **

    Original query

    SELECT wp_posts.*
    FROM   wp_posts
    WHERE  wp_posts.post_status = 'publish'
           AND wp_posts.post_type = 'post'
    GROUP  BY wp_posts.post_author
    ORDER  BY wp_posts.post_date DESC; 
    

    Modified query

    SELECT p.post_status,
           p.post_type,
           Max(p.post_date),
           p.post_author
    FROM   wp_posts P
    WHERE  p.post_status = "publish"
           AND p.post_type = "post"
    GROUP  BY p.post_author
    ORDER  BY p.post_date; 
    

    becasue i'm using max in the select clause ==> max(p.post_date) it is possible to avoid sub select queries and order by the max column after the group by.

    0 讨论(0)
  • 2020-11-22 07:42

    No. It makes no sense to order the records before grouping, since grouping is going to mutate the result set. The subquery way is the preferred way. If this is going too slow you would have to change your table design, for example by storing the id of of the last post for each author in a seperate table, or introduce a boolean column indicating for each author which of his post is the last one.

    0 讨论(0)
  • 2020-11-22 07:47

    What you are going to read is rather hacky, so don't try this at home!

    In SQL in general the answer to your question is NO, but because of the relaxed mode of the GROUP BY (mentioned by @bluefeet), the answer is YES in MySQL.

    Suppose, you have a BTREE index on (post_status, post_type, post_author, post_date). How does the index look like under the hood?

    (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-31') (post_status='publish', post_type='post', post_author='user B', post_date='2012-10-01') (post_status='publish', post_type='post', post_author='user B', post_date='2012-12-01')

    That is data is sorted by all those fields in ascending order.

    When you are doing a GROUP BY by default it sorts data by the grouping field (post_author, in our case; post_status, post_type are required by the WHERE clause) and if there is a matching index, it takes data for each first record in ascending order. That is the query will fetch the following (the first post for each user):

    (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user B', post_date='2012-10-01')

    But GROUP BY in MySQL allows you to specify the order explicitly. And when you request post_user in descending order, it will walk through our index in the opposite order, still taking the first record for each group which is actually last.

    That is

    ...
    WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post'
    GROUP BY wp_posts.post_author DESC
    

    will give us

    (post_status='publish', post_type='post', post_author='user B', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-31')

    Now, when you order the results of the grouping by post_date, you get the data you wanted.

    SELECT wp_posts.*
    FROM wp_posts
    WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post'
    GROUP BY wp_posts.post_author DESC
    ORDER BY wp_posts.post_date DESC;
    

    NB:

    This is not what I would recommend for this particular query. In this case, I would use a slightly modified version of what @bluefeet suggests. But this technique might be very useful. Take a look at my answer here: Retrieving the last record in each group

    Pitfalls: The disadvantages of the approach is that

    • the result of the query depends on the index, which is against the spirit of the SQL (indexes should only speed up queries);
    • index does not know anything about its influence on the query (you or someone else in future might find the index too resource-consuming and change it somehow, breaking the query results, not only its performance)
    • if you do not understand how the query works, most probably you'll forget the explanation in a month and the query will confuse you and your colleagues.

    The advantage is performance in hard cases. In this case, the performance of the query should be the same as in @bluefeet's query, because of amount of data involved in sorting (all data is loaded into a temporary table and then sorted; btw, his query requires the (post_status, post_type, post_author, post_date) index as well).

    What I would suggest:

    As I said, those queries make MySQL waste time sorting potentially huge amounts of data in a temporary table. In case you need paging (that is LIMIT is involved) most of the data is even thrown off. What I would do is minimize the amount of sorted data: that is sort and limit a minimum of data in the subquery and then join back to the whole table.

    SELECT * 
    FROM wp_posts
    INNER JOIN
    (
      SELECT max(post_date) post_date, post_author
      FROM wp_posts
      WHERE post_status='publish' AND post_type='post'
      GROUP BY post_author
      ORDER BY post_date DESC
      -- LIMIT GOES HERE
    ) p2 USING (post_author, post_date)
    WHERE post_status='publish' AND post_type='post';
    

    The same query using the approach described above:

    SELECT *
    FROM (
      SELECT post_id
      FROM wp_posts
      WHERE post_status='publish' AND post_type='post'
      GROUP BY post_author DESC
      ORDER BY post_date DESC
      -- LIMIT GOES HERE
    ) as ids
    JOIN wp_posts USING (post_id);
    

    All those queries with their execution plans on SQLFiddle.

    0 讨论(0)
提交回复
热议问题