Simple way to calculate median with MySQL

后端 未结 30 1139
北荒
北荒 2020-11-22 04:20

What\'s the simplest (and hopefully not too slow) way to calculate the median with MySQL? I\'ve used AVG(x) for finding the mean, but I\'m having a hard time fi

相关标签:
30条回答
  • 2020-11-22 04:51

    Building off of velcro's answer, for those of you having to do a median off of something that is grouped by another parameter:

    SELECT grp_field, t1.val FROM (
       SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number,
       @s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
      FROM data d,  (SELECT @rownum:=0, @s:=0) r
      ORDER BY grp_field, d.val
    ) as t1 JOIN (
      SELECT grp_field, count(*) as total_rows
      FROM data d
      GROUP BY grp_field
    ) as t2
    ON t1.grp_field = t2.grp_field
    WHERE t1.row_number=floor(total_rows/2)+1;
    

    0 讨论(0)
  • 2020-11-22 04:52

    Most of the solutions above work only for one field of the table, you might need to get the median (50th percentile) for many fields on the query.

    I use this:

    SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
     GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
      ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
    FROM table_name;
    

    You can replace the "50" in example above to any percentile, is very efficient.

    Just make sure you have enough memory for the GROUP_CONCAT, you can change it with:

    SET group_concat_max_len = 10485760; #10MB max length
    

    More details: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/

    0 讨论(0)
  • 2020-11-22 04:52

    My code, efficient without tables or additional variables:

    SELECT
    ((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
    +
    (SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
    as median
    FROM table;
    
    0 讨论(0)
  • 2020-11-22 04:52

    Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).

    SELECT Avg(tmp.val) as median_val
        FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
                  FROM data as inTab,  (SELECT @rows := -1) as init
                  -- Replace with better where clause or delete
                  WHERE 2 > 1
                  ORDER BY inTab.val) as tmp
        WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));
    
    0 讨论(0)
  • 2020-11-22 04:52

    This way seems include both even and odd count without subquery.

    SELECT AVG(t1.x)
    FROM table t1, table t2
    GROUP BY t1.x
    HAVING SUM(SIGN(t1.x - t2.x)) = 0
    
    0 讨论(0)
  • 2020-11-22 04:54

    Here is my way . Of course, you could put it into a procedure :-)

    SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);
    
    SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');
    
    PREPARE median FROM @median;
    
    EXECUTE median;
    

    You could avoid the variable @median_counter, if you substitude it:

    SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
                          (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
                          ', 1'
                        );
    
    PREPARE median FROM @median;
    
    EXECUTE median;
    
    0 讨论(0)
提交回复
热议问题