quartile | 易学教程

How does pandas calculate quartiles?

阅读更多关于 How does pandas calculate quartiles?

问题 I have a very simple dataframe: df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val']) df.median() = 23 which is right because from 19 values in the list, 23 is 10th value (9 values before 23, and 9 values after 23) I tried to calculate 1st and 3rt quartile as: df.quantile([.25, .75]) val 0.25 20.0 0.75 23.5 I would have expected that from 9 values bellow median that 1st quartile should be 19, but as you can see above, python says it is 20. Similarly, for

How to output different 25th, 50th, 75th percentiles in single Teradata query?

阅读更多关于 How to output different 25th, 50th, 75th percentiles in single Teradata query?

问题 I had got stuck few hours back on around something similar and worked out a less messy code for outputting 25th, 50th, 75th percentiles in a single Teradata query. Can be further extended to produce a " 5 point summary ". For minimum and maximum change static values according to your population estimate. Somewhere someone had asked for an elegant approach. Sharing mine. Here's the code: SELECT MAX(PER_MIN) AS PER_MIN, MAX(PER_25) AS PER_25, MAX(PER_50) AS PER_50, MAX(PER_75) AS PER_75, MAX

Continuous quantiles of a scatterplot

阅读更多关于 Continuous quantiles of a scatterplot

问题 I have a data set, for which I graphed a regression (using ggplot2 's stat_smooth ) : ggplot(data = mydf, aes(x=time, y=pdm)) + geom_point() + stat_smooth(col="red") I'd also like to have the quantiles (if it's simpler, having only the quartiles will do) using the same method. All I manage to get is the following : ggplot(data = mydf, aes(x=time, y=pdm, z=surface)) + geom_point() + stat_smooth(col="red") + stat_quantile(quantiles = c(0.25,0.75)) Unfortunately, I can't put method="loess" in

Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

阅读更多关于 Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

问题 I'm currently trying to neatly cut data with use of the Hmisc package, as in the example below: dummy <- data.frame(important_variable=seq(1:1000)) require(Hmisc) dummy$cuts <- cut2(dummy$important_variable, g = 4) The produced cuts are correct with respect to the values: important_variable cuts 1 1 [ 1, 251) 2 2 [ 1, 251) 3 3 [ 1, 251) 4 4 [ 1, 251) 5 5 [ 1, 251) 6 6 [ 1, 251) > table(dummy$cuts) [ 1, 251) [251, 501) [501, 751) [751,1000] 250 250 250 250 However, I would like for the data to

Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

阅读更多关于 Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

I'm currently trying to neatly cut data with use of the Hmisc package, as in the example below: dummy <- data.frame(important_variable=seq(1:1000)) require(Hmisc) dummy$cuts <- cut2(dummy$important_variable, g = 4) The produced cuts are correct with respect to the values: important_variable cuts 1 1 [ 1, 251) 2 2 [ 1, 251) 3 3 [ 1, 251) 4 4 [ 1, 251) 5 5 [ 1, 251) 6 6 [ 1, 251) > table(dummy$cuts) [ 1, 251) [251, 501) [501, 751) [751,1000] 250 250 250 250 However, I would like for the data to be presented slightly differently. For instance instead of [ 1, 251 ) [ 251, 501 ) I would prefer the

Trying to calculate quartiles in MDX

阅读更多关于 Trying to calculate quartiles in MDX

My data looks like this: ID |PersonID |CompanyID |DateID |Throughput |AmountType 33F467AC-F35B-4F24-A05B-FC35CF005981 |7 |53 |200802 |3 |0 04EE0FF0-511D-48F5-AA58-7600B3A69695 |18 |4 |201309 |5 |0 AB058AA5-6228-4E7C-9469-55827A5A34C3 |25 |69 |201108 |266 |0 with around a million rows. The columns names *ID refers to other tables, so they can be used as dimensions. I have an OLAP cube with the column Throughput as Measure and the rest as dimensions. I want to calculate Quartile 1 and 3 of the Throughput measure. I followed this guide: https://electrovoid.wordpress.com/2011/06/24/ssas-quartile/