There are plenty of similar questions to be found on here but I don\'t think that any answer the question adequately.
I\'ll continue from the current most popular qu
** Sub queries may have a bad impact on performance when used with large datasets **
Original query
SELECT wp_posts.*
FROM wp_posts
WHERE wp_posts.post_status = 'publish'
AND wp_posts.post_type = 'post'
GROUP BY wp_posts.post_author
ORDER BY wp_posts.post_date DESC;
Modified query
SELECT p.post_status,
p.post_type,
Max(p.post_date),
p.post_author
FROM wp_posts P
WHERE p.post_status = "publish"
AND p.post_type = "post"
GROUP BY p.post_author
ORDER BY p.post_date;
becasue i'm using max
in the select clause
==> max(p.post_date)
it is possible to avoid sub select queries and order by the max column after the group by.
No. It makes no sense to order the records before grouping, since grouping is going to mutate the result set. The subquery way is the preferred way. If this is going too slow you would have to change your table design, for example by storing the id of of the last post for each author in a seperate table, or introduce a boolean column indicating for each author which of his post is the last one.
What you are going to read is rather hacky, so don't try this at home!
In SQL in general the answer to your question is NO, but because of the relaxed mode of the GROUP BY
(mentioned by @bluefeet), the answer is YES in MySQL.
Suppose, you have a BTREE index on (post_status, post_type, post_author, post_date). How does the index look like under the hood?
(post_status='publish', post_type='post', post_author='user A', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-31') (post_status='publish', post_type='post', post_author='user B', post_date='2012-10-01') (post_status='publish', post_type='post', post_author='user B', post_date='2012-12-01')
That is data is sorted by all those fields in ascending order.
When you are doing a GROUP BY
by default it sorts data by the grouping field (post_author
, in our case; post_status, post_type are required by the WHERE
clause) and if there is a matching index, it takes data for each first record in ascending order. That is the query will fetch the following (the first post for each user):
(post_status='publish', post_type='post', post_author='user A', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user B', post_date='2012-10-01')
But GROUP BY
in MySQL allows you to specify the order explicitly. And when you request post_user
in descending order, it will walk through our index in the opposite order, still taking the first record for each group which is actually last.
That is
...
WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author DESC
will give us
(post_status='publish', post_type='post', post_author='user B', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-31')
Now, when you order the results of the grouping by post_date, you get the data you wanted.
SELECT wp_posts.*
FROM wp_posts
WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author DESC
ORDER BY wp_posts.post_date DESC;
NB:
This is not what I would recommend for this particular query. In this case, I would use a slightly modified version of what @bluefeet suggests. But this technique might be very useful. Take a look at my answer here: Retrieving the last record in each group
Pitfalls: The disadvantages of the approach is that
The advantage is performance in hard cases. In this case, the performance of the query should be the same as in @bluefeet's query, because of amount of data involved in sorting (all data is loaded into a temporary table and then sorted; btw, his query requires the (post_status, post_type, post_author, post_date)
index as well).
What I would suggest:
As I said, those queries make MySQL waste time sorting potentially huge amounts of data in a temporary table. In case you need paging (that is LIMIT is involved) most of the data is even thrown off. What I would do is minimize the amount of sorted data: that is sort and limit a minimum of data in the subquery and then join back to the whole table.
SELECT *
FROM wp_posts
INNER JOIN
(
SELECT max(post_date) post_date, post_author
FROM wp_posts
WHERE post_status='publish' AND post_type='post'
GROUP BY post_author
ORDER BY post_date DESC
-- LIMIT GOES HERE
) p2 USING (post_author, post_date)
WHERE post_status='publish' AND post_type='post';
The same query using the approach described above:
SELECT *
FROM (
SELECT post_id
FROM wp_posts
WHERE post_status='publish' AND post_type='post'
GROUP BY post_author DESC
ORDER BY post_date DESC
-- LIMIT GOES HERE
) as ids
JOIN wp_posts USING (post_id);
All those queries with their execution plans on SQLFiddle.