How can I calculate the median of values in SQLite?

前端 未结 5 2036
旧巷少年郎
旧巷少年郎 2020-12-09 02:59

I\'d like to calculate the median value in a numeric row. How can I do that in SQLite 4?

相关标签:
5条回答
  • 2020-12-09 03:37

    There is an extension pack of various math functions for sqlite3. It includes group functions like median.

    It will be more work getting this going than CL's answer, but might be worthwhile if you think you will need any of the other functions.

    http://www.sqlite.org/contrib/download/extension-functions.c?get=25

    (Here is the guide for how to compile and load SQLite extensions.)

    From description:

    Provide mathematical and string extension functions for SQL queries using the loadable extensions mechanism. Math: acos, asin, atan, atn2, atan2, acosh, asinh, atanh, difference, degrees, radians, cos, sin, tan, cot, cosh, sinh, tanh, coth, exp, log, log10, power, sign, sqrt, square, ceil, floor, pi. String: replicate, charindex, leftstr, rightstr, ltrim, rtrim, trim, replace, reverse, proper, padl, padr, padc, strfilter. Aggregate: stdev, variance, mode, median, lower_quartile, upper_quartile.

    UPDATE 2015-04-12: Fixing "undefined symbol: sinh"

    As has been mentioned in comments, this extension may not work properly despite a successful compile.

    For example, compiling may work and on Linux you might copy the resulting .so file to /usr/local/lib. But .load /usr/local/lib/libsqlitefunctions from the sqlite3 shell may then generate this error:

    Error: /usr/local/lib/libsqlitefunctions.so: undefined symbol: sinh
    

    Compiling it this way seems to work:

    gcc -fPIC -shared extension-functions.c -o libsqlitefunctions.so -lm
    

    And copying the .so file to /usr/local/lib shows no similar error:

    sqlite> .load /usr/local/lib/libsqlitefunctions
    
    sqlite> select cos(pi()/4.0);
    ---> 0.707106781186548
    

    I'm not sure why the order of options to gcc matters in this particular case, but apparently it does.

    Credit for noticing this goes to Ludvick Lidicky's comment on this blog post

    0 讨论(0)
  • 2020-12-09 03:38

    There is a log table with timestamp, label, and latency. We want to see the latency median value of each label, grouped by timestamp. Format all latency value to 15 char length with leading zeroes, concatenate it, and cut half positioned value(s).. there is the median.

    select L, --V, 
           case when C % 2 = 0 then
           ( substr( V, ( C - 1 ) * 15 + 1, 15) * 1 + substr( V, C * 15 + 1, 15) * 1 ) / 2
           else
            substr( V, C * 15 + 1, 15) * 1
           end as MEDST
    from (
        select L, group_concat(ST, "") as V, count(ST) / 2 as C
        from (
            select label as L, 
                   substr( timeStamp, 1, 8) * 1 as T, 
                   printf( '%015d',latency) as ST
            from log
            where label not like '%-%' and responseMessage = 'OK'
            order by L, T, ST ) as XX
        group by L
        ) as YY
    
    0 讨论(0)
  • 2020-12-09 03:39

    Dixtroy provided the best solution via group_concat(). Here is a full sample for this:

    DROP TABLE [t];
    CREATE TABLE [t] (name, value INT);
    INSERT INTO t VALUES ('A', 2);
    INSERT INTO t VALUES ('A', 3);
    INSERT INTO t VALUES ('B', 4);
    INSERT INTO t VALUES ('B', 5);
    INSERT INTO t VALUES ('B', 6);
    INSERT INTO t VALUES ('C', 7);
    

    results into this table:

    name|value
    A|2
    A|3
    B|4
    B|5
    B|6
    C|7
    

    now we use the (slightly modified) query from Dextroy:

    SELECT name, --string_list, count, middle,
        CASE WHEN count%2=0 THEN
            0.5 * substr(string_list, middle-10, 10) + 0.5 * substr(string_list, middle, 10)
        ELSE
            1.0 * substr(string_list, middle, 10)
        END AS median
    FROM (
        SELECT name, 
            group_concat(value_string,"") AS string_list,
            count() AS count, 
            1 + 10*(count()/2) AS middle
        FROM (
            SELECT name, 
                printf( '%010d',value) AS value_string
            FROM [t]
            ORDER BY name,value_string
        )
        GROUP BY name
    );
    

    ...and get this result:

    name|median
    A|2.5
    B|5.0
    C|7.0
    
    0 讨论(0)
  • 2020-12-09 03:39

    The SELECT AVG(x) returns just the year of date values formatted as YYYY-MM-DD, so I tweaked CL's solution just slightly to accommodate dates:

    SELECT DATE(JULIANDAY(MIN(MyDate)) + (JULIANDAY(MAX(MyDate)) - JULIANDAY(MIN(MyDate)))/2) as Median_Date
    FROM (
       SELECT MyDate
          FROM MyTable
          ORDER BY MyDate
          LIMIT 2 - ((SELECT COUNT(*) FROM MyTable) % 2) -- odd 1, even 2
          OFFSET (SELECT (COUNT(*) - 1) / 2 FROM MyTable)
    );
    
    0 讨论(0)
  • 2020-12-09 03:49

    Let's say that the median is the element in the middle of an ordered list.

    SQLite (4 or 3) does not have any built-in function for that, but it's possible to do this by hand:

    SELECT x
    FROM MyTable
    ORDER BY x
    LIMIT 1
    OFFSET (SELECT COUNT(*)
            FROM MyTable) / 2
    

    When there is an even number of records, it is common to define the median as the average of the two middle records. In this case, the average can be computed like this:

    SELECT AVG(x)
    FROM (SELECT x
          FROM MyTable
          ORDER BY x
          LIMIT 2
          OFFSET (SELECT (COUNT(*) - 1) / 2
                  FROM MyTable))
    

    Combining the odd and even cases then results in this:

    SELECT AVG(x)
    FROM (SELECT x
          FROM MyTable
          ORDER BY x
          LIMIT 2 - (SELECT COUNT(*) FROM MyTable) % 2    -- odd 1, even 2
          OFFSET (SELECT (COUNT(*) - 1) / 2
                  FROM MyTable))
    
    0 讨论(0)
提交回复
热议问题