Simple way to calculate median with MySQL

后端 未结 30 1107
北荒
北荒 2020-11-22 04:20

What\'s the simplest (and hopefully not too slow) way to calculate the median with MySQL? I\'ve used AVG(x) for finding the mean, but I\'m having a hard time fi

相关标签:
30条回答
  • 2020-11-22 04:32

    Single query to archive the perfect median:

    SELECT 
    COUNT(*) as total_rows, 
    IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median, 
    AVG(val) as average 
    FROM 
    data
    
    0 讨论(0)
  • 2020-11-22 04:34

    I found the accepted solution didn't work on my MySQL install, returning an empty set, but this query worked for me in all situations that I tested it on:

    SELECT x.val from data x, data y
    GROUP BY x.val
    HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
    LIMIT 1
    
    0 讨论(0)
  • 2020-11-22 04:35

    Based on @bob's answer, this generalizes the query to have the ability to return multiple medians, grouped by some criteria.

    Think, e.g., median sale price for used cars in a car lot, grouped by year-month.

    SELECT 
        period, 
        AVG(middle_values) AS 'median' 
    FROM (
        SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
        FROM (
            SELECT 
                @last_period:=@period AS 'last_period',
                @period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
                IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`, 
                x.sale_price
              FROM listings AS x, (SELECT @row:=0) AS r
              WHERE 1
                -- where criteria goes here
              ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
            ) AS t1
        LEFT JOIN (  
              SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
              FROM listings x
              WHERE 1
                -- same where criteria goes here
              GROUP BY DATE_FORMAT(sale_date, '%Y%m')
            ) AS t2
            ON t1.period = t2.period
        ) AS t3
    WHERE 
        row_num >= (count/2) 
        AND row_num <= ((count/2) + 1)
    GROUP BY t3.period
    ORDER BY t3.period;
    
    0 讨论(0)
  • 2020-11-22 04:36

    A comment on this page in the MySQL documentation has the following suggestion:

    -- (mostly) High Performance scaling MEDIAN function per group
    -- Median defined in http://en.wikipedia.org/wiki/Median
    --
    -- by Peter Hlavac
    -- 06.11.2008
    --
    -- Example Table:
    
    DROP table if exists table_median;
    CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
    COMMIT;
    
    
    INSERT INTO table_median (id, val) VALUES
    (1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
    (2, 4),
    (3, 5), (3, 2),
    (4, 5), (4, 12), (4, 1), (4, 7);
    
    
    
    -- Calculating the MEDIAN
    SELECT @a := 0;
    SELECT
    id,
    AVG(val) AS MEDIAN
    FROM (
    SELECT
    id,
    val
    FROM (
    SELECT
    -- Create an index n for every id
    @a := (@a + 1) mod o.c AS shifted_n,
    IF(@a mod o.c=0, o.c, @a) AS n,
    o.id,
    o.val,
    -- the number of elements for every id
    o.c
    FROM (
    SELECT
    t_o.id,
    val,
    c
    FROM
    table_median t_o INNER JOIN
    (SELECT
    id,
    COUNT(1) AS c
    FROM
    table_median
    GROUP BY
    id
    ) t2
    ON (t2.id = t_o.id)
    ORDER BY
    t_o.id,val
    ) o
    ) a
    WHERE
    IF(
    -- if there is an even number of elements
    -- take the lower and the upper median
    -- and use AVG(lower,upper)
    c MOD 2 = 0,
    n = c DIV 2 OR n = (c DIV 2)+1,
    
    -- if its an odd number of elements
    -- take the first if its only one element
    -- or take the one in the middle
    IF(
    c = 1,
    n = 1,
    n = c DIV 2 + 1
    )
    )
    ) a
    GROUP BY
    id;
    
    -- Explanation:
    -- The Statement creates a helper table like
    --
    -- n id val count
    -- ----------------
    -- 1, 1, 1, 7
    -- 2, 1, 3, 7
    -- 3, 1, 4, 7
    -- 4, 1, 5, 7
    -- 5, 1, 6, 7
    -- 6, 1, 7, 7
    -- 7, 1, 8, 7
    --
    -- 1, 2, 4, 1
    
    -- 1, 3, 2, 2
    -- 2, 3, 5, 2
    --
    -- 1, 4, 1, 4
    -- 2, 4, 5, 4
    -- 3, 4, 7, 4
    -- 4, 4, 12, 4
    
    
    -- from there we can select the n-th element on the position: count div 2 + 1 
    
    0 讨论(0)
  • 2020-11-22 04:37

    In MariaDB / MySQL:

    SELECT AVG(dd.val) as median_val
    FROM (
    SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
      FROM data d, (SELECT @rownum:=0) r
      WHERE d.val is NOT NULL
      -- put some where clause here
      ORDER BY d.val
    ) as dd
    WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );
    

    Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

    Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

    SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
    SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
    

    Finally, MariaDB 10.3.3+ contains a MEDIAN function

    0 讨论(0)
  • 2020-11-22 04:38
    SELECT 
        SUBSTRING_INDEX(
            SUBSTRING_INDEX(
                GROUP_CONCAT(field ORDER BY field),
                ',',
                ((
                    ROUND(
                        LENGTH(GROUP_CONCAT(field)) - 
                        LENGTH(
                            REPLACE(
                                GROUP_CONCAT(field),
                                ',',
                                ''
                            )
                        )
                    ) / 2) + 1
                )),
                ',',
                -1
            )
    FROM
        table
    

    The above seems to work for me.

    0 讨论(0)
提交回复
热议问题