I have a script which counts median value for all table data:
SELECT avg(t1.price) as median_val FROM (
SELECT @rownum:=@rownum+1 as `row_number`, d.price
FROM mediana d, (SELECT @rownum:=0) r
WHERE 1
ORDER BY d.price
) as t1,
(
SELECT count(*) as total_rows
FROM mediana d
WHERE 1
) as t2
AND t1.row_number>=total_rows/2 and t1.row_number<=total_rows/2+1;
Now I need to get median value not for all table values, but grouped by date. Is it possible? http://sqlfiddle.com/#!2/7cf27 - so as result I will get 2013-03-06 - 1.5 , 2013-03-05 - 3.5.
I hope I didn't loose myself and overcomplicate things, but here's what I came up with:
SELECT sq.created_at, avg(sq.price) as median_val FROM (
SELECT t1.row_number, t1.price, t1.created_at FROM(
SELECT IF(@prev!=d.created_at, @rownum:=1, @rownum:=@rownum+1) as `row_number`, d.price, @prev:=d.created_at AS created_at
FROM mediana d, (SELECT @rownum:=0, @prev:=NULL) r
ORDER BY created_at, price
) as t1 INNER JOIN
(
SELECT count(*) as total_rows, created_at
FROM mediana d
GROUP BY created_at
) as t2
ON t1.created_at = t2.created_at
WHERE 1=1
AND t1.row_number>=t2.total_rows/2 and t1.row_number<=t2.total_rows/2+1
)sq
group by sq.created_at
What I did here, is mainly just to reset the rownumber to 1 when the date changes (it's important to order by created_at) and included the date so we can group by it. In the query which calculates total rows I also included created_at, so we can join the two subqueries.
Here is another take on the median inspired by this post using SUBSTRING_INDEX
and GROUP_CONCAT
. I am not sure about the performance on large tables relative to the method described by @fancyPants that uses row numbers, but on smaller tables (~20K rows) it works very fast.
SET SESSION group_concat_max_len = 1000000;
SELECT
created_at,
(
CAST(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(
price ORDER BY price SEPARATOR ','),
',', FLOOR((COUNT(*)+1)/2) ), ',', -1) AS DECIMAL) +
CAST(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(
price ORDER BY price SEPARATOR ','),
',', FLOOR((COUNT(*)+2)/2) ), ',', -1) AS DECIMAL)
) / 2.0 AS median_price
FROM
mediana
GROUP BY
created_at
;
Here is the output for the sqlfiddle given in the question (the fiddle appears to be broken, but I run this on the table shown in the fiddle within MySQL itself):
+------------+--------------+
| created_at | median_price |
+------------+--------------+
| 2012-03-05 | 3.5000 |
| 2012-03-06 | 1.5000 |
+------------+--------------+
The GROUP_CONCAT
essentially creates a string representation of an array of prices per created_at
date. The two SUBSTRING_INDEX
commands then look for the middle value(s), i.e. the median. It is necessary to have two calls to the GROUP_CONCAT
and average them to handle the case in which there are an even number of price
elements for a single created_at
date.
UPDATE:
It is worth mentioning that the GROUP_CONCAT
function has a default length of 1024 bytes, see here. This may cause very long results to be truncated, which will cause a miscalculation. You can set a larger default with the command SET SESSION group_concat_max_len = N;
where N
is some other, larger value if you are concerned about large results. I have added that setting to the code snippet above. I chose 1000000, but you could use another value as well.
You can also spot check your results using COUNT(*)
and OFFSET
with one of your GROUP BY
values. For example,
- First get the count of the number of rows for a specific
GROUP BY
value,
SELECT COUNT(*) FROM mediana WHERE created_at = '2012-03-06';
Let
X
be the number of rows you get from step 1. DivideX
by 2 to get half its value,Y
.Use the value
Y
as an offset to find the median.a. If
Y
was a whole number then do bothSELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET (Y-1);
and
SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET Y;
and average the two results to get the median value.
b. If
Y
was a decimal, then roundY
down to the nearest whole number (call itW
) and use that as a single offset,SELECT price FROM mediana WHERE created_at = '2012-03-06' ORDER BY price LIMIT 1 OFFSET W;
and this will be your median value.
来源:https://stackoverflow.com/questions/15386799/count-median-grouped-by-day