问题
StackOverflow to the rescue!, I need to find the medians for five columns at once, in one query call.
The median calculations below work for single columns, but when combined, multiple uses of "rownum" throws the query off. How can I update this to work for multiple columns? THANK YOU. It's to create a web tool where nonprofits can compare their financial metrics to user-defined peer groups.
SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp
FROM (
SELECT @rownum := @rownum +1 AS `row_number` , d_wages.totalwages_pctoftotexp
FROM data_990_c3 d_wages, (
SELECT @rownum :=0
)r_wages
WHERE totalwages_pctoftotexp >0
ORDER BY d_wages.totalwages_pctoftotexp
) AS t1_wages, (
SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_wages
WHERE totalwages_pctoftotexp >0
) AS t2_wages
WHERE 1
AND t1_wages.row_number = FLOOR( total_rows /2 ) +1
--- [that was one median, below is another] ---
SELECT t1_solvent.solvent_days AS median_solvent_days
FROM (
SELECT @rownum := @rownum +1 AS `row_number` , d_solvent.solvent_days
FROM data_990_c3 d_solvent, (
SELECT @rownum :=0
)r_solvent
WHERE solvent_days >0
ORDER BY d_solvent.solvent_days
) AS t1_solvent, (
SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_solvent
WHERE solvent_days >0
) AS t2_solvent
WHERE 1
AND t1_solvent.row_number = FLOOR( total_rows /2 ) +1
[those are two - there are five in total I'll eventually need to find medians for at once]
回答1:
This kind of thing is a big pain in the neck in MySQL. You might be wise to use the free Oracle Express Edition or postgreSQL if you're going to do tonnage of this statistical ranking work. They all have MEDIAN(value)
aggregate functions that are either built-in or available as extensions. Here's a little sqlfiddle demonstrating that. http://sqlfiddle.com/#!4/53de8/6/0
But you didn't ask about that.
In MySQL, your basic problem is the scope of variables like @rownum. You also have a pivoting problem: that is, you need to turn rows of your query into columns.
Let's tackle the pivot problem first. What you're going to do is create a union of several big fat queries. For example:
SELECT 'median_wages' AS tag, wages AS value
FROM (big fat query making median wages) A
UNION
SELECT 'median_volunteer_hours' AS tag, hours AS value
FROM (big fat query making median volunteer hours) B
UNION
SELECT 'median_solvent_days' AS tag, days AS value
FROM (big fat query making median solvency days) C
So here are your results in a table of tag / value pairs. You can pivot that table like so, to get one row with a value in each column.
SELECT SUM( CASE tag WHEN 'median_wages' THEN value ELSE 0 END
) AS median_wages,
SELECT SUM( CASE tag WHEN 'median_volunteer_hours' THEN value ELSE 0 END
) AS median_volunteer_hours,
SELECT SUM( CASE tag WHEN 'median_solvent_days' THEN value ELSE 0 END
) AS median_solvent_days
FROM (
/* the above gigantic UNION query */
) Q
That's how you pivot up rows (from the UNION query in this case) to columns. Here's a tutorial on the topic. http://www.artfulsoftware.com/infotree/qrytip.php?id=523
Now we need to tackle the median-computing subqueries. The code in your question looks pretty good. I don't have your data so it's hard for me to evaluate it.
But you need to avoid reusing the @rownum variable. Call it @rownum1 in one of your queries, @rownum2 in the next one, and so on. Here's a dinky sql fiddle doing just one of these. http://sqlfiddle.com/#!2/2f770/1/0
Now let's build it up a bit, doing two different medians. Here's the fiddle http://sqlfiddle.com/#!2/2f770/2/0 and here's the UNION query. Notice the second half of the union query uses @rownum2
instead of @rownum
.
Finally, here's the full query with the pivoting. http://sqlfiddle.com/#!2/2f770/13/0
SELECT SUM( CASE tag WHEN 'Boston' THEN value ELSE 0 END ) AS Boston,
SUM( CASE tag WHEN 'Bronx' THEN value ELSE 0 END ) AS Bronx
FROM (
SELECT 'Boston' AS tag, pop AS VALUE
FROM (
SELECT @rownum := @rownum +1 AS `row_number` , pop
FROM pops,
(SELECT @rownum :=0)r
WHERE pop >0 AND city = 'Boston'
ORDER BY pop
) AS ordered_rows,
(
SELECT COUNT( * ) AS total_rows
FROM pops
WHERE pop >0 AND city = 'Boston'
) AS rowcount
WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
UNION ALL
SELECT 'Bronx' AS tag, pop AS VALUE
FROM (
SELECT @rownum2 := @rownum2 +1 AS `row_number` , pop
FROM pops,
(SELECT @rownum2 :=0)r
WHERE pop >0 AND city = 'Bronx'
ORDER BY pop
) AS ordered_rows,
(
SELECT COUNT( * ) AS total_rows
FROM pops
WHERE pop >0 AND city = 'Bronx'
) AS rowcount
WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
) D
This is just two medians. You need five. I think it's easy to make the case that this median computation is absurdly difficult to do in MySQL in a single query.
回答2:
Suppose you have a table with three columns like table(key, value1, value2).
this query gives you the median value of the two value columns for each key:
SELECT key,
((array_agg(value1 order by value1 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value1 order by value1 asc) )[ceiling( (count(*)+1)::float/2) ] )/2,
((array_agg(value2 order by value2 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value2 order by value2 asc) )[ceiling( (count(*)+1)::float/2) ] )/2
FROM table
GROUP BY key
来源:https://stackoverflow.com/questions/17447917/calculate-medians-for-multiple-columns-in-the-same-table-in-one-query-call