What\'s the simplest (and hopefully not too slow) way to calculate the median with MySQL? I\'ve used AVG(x)
for finding the mean, but I\'m having a hard time fi
Building off of velcro's answer, for those of you having to do a median off of something that is grouped by another parameter:
SELECT grp_field, t1.val FROM (
SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number
,
@s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
FROM data d, (SELECT @rownum:=0, @s:=0) r
ORDER BY grp_field, d.val
) as t1 JOIN (
SELECT grp_field, count(*) as total_rows
FROM data d
GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;
Most of the solutions above work only for one field of the table, you might need to get the median (50th percentile) for many fields on the query.
I use this:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;
You can replace the "50" in example above to any percentile, is very efficient.
Just make sure you have enough memory for the GROUP_CONCAT, you can change it with:
SET group_concat_max_len = 10485760; #10MB max length
More details: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/
My code, efficient without tables or additional variables:
SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;
Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).
SELECT Avg(tmp.val) as median_val
FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
FROM data as inTab, (SELECT @rows := -1) as init
-- Replace with better where clause or delete
WHERE 2 > 1
ORDER BY inTab.val) as tmp
WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));
This way seems include both even and odd count without subquery.
SELECT AVG(t1.x)
FROM table t1, table t2
GROUP BY t1.x
HAVING SUM(SIGN(t1.x - t2.x)) = 0
Here is my way . Of course, you could put it into a procedure :-)
SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);
SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');
PREPARE median FROM @median;
EXECUTE median;
You could avoid the variable @median_counter
, if you substitude it:
SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
(SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
', 1'
);
PREPARE median FROM @median;
EXECUTE median;