Simple way to calculate median with MySQL

后端 未结 30 1109
北荒
北荒 2020-11-22 04:20

What\'s the simplest (and hopefully not too slow) way to calculate the median with MySQL? I\'ve used AVG(x) for finding the mean, but I\'m having a hard time fi

相关标签:
30条回答
  • 2020-11-22 04:54

    A simple way to calculate Median in MySQL

    set @ct := (select count(1) from station);
    set @row := 0;
    
    select avg(a.val) as median from 
    (select * from  table order by val) a
    where (select @row := @row + 1)
    between @ct/2.0 and @ct/2.0 +1;
    
    0 讨论(0)
  • 2020-11-22 04:57

    If MySQL has ROW_NUMBER, then the MEDIAN is (be inspired by this SQL Server query):

    WITH Numbered AS 
    (
    SELECT *, COUNT(*) OVER () AS Cnt,
        ROW_NUMBER() OVER (ORDER BY val) AS RowNum
    FROM yourtable
    )
    SELECT id, val
    FROM Numbered
    WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
    ;
    

    The IN is used in case you have an even number of entries.

    If you want to find the median per group, then just PARTITION BY group in your OVER clauses.

    Rob

    0 讨论(0)
  • 2020-11-22 04:58

    I propose a faster way.

    Get the row count:

    SELECT CEIL(COUNT(*)/2) FROM data;

    Then take the middle value in a sorted subquery:

    SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

    I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.

    0 讨论(0)
  • 2020-11-22 04:58

    I have this below code which I found on HackerRank and it is pretty simple and works in each and every case.

    SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE  
      (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) = 
      (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
    
    0 讨论(0)
  • 2020-11-22 04:58

    Takes care about an odd value count - gives the avg of the two values in the middle in that case.

    SELECT AVG(val) FROM
      ( SELECT x.id, x.val from data x, data y
          GROUP BY x.id, x.val
          HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
      ) sq
    
    0 讨论(0)
  • 2020-11-22 04:59

    I used a two query approach:

    • first one to get count, min, max and avg
    • second one (prepared statement) with a "LIMIT @count/2, 1" and "ORDER BY .." clauses to get the median value

    These are wrapped in a function defn, so all values can be returned from one call.

    If your ranges are static and your data does not change often, it might be more efficient to precompute/store these values and use the stored values instead of querying from scratch every time.

    0 讨论(0)
提交回复
热议问题