quartile

How does pandas calculate quartiles?

十年热恋 提交于 2019-12-11 19:27:22
问题 I have a very simple dataframe: df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val']) df.median() = 23 which is right because from 19 values in the list, 23 is 10th value (9 values before 23, and 9 values after 23) I tried to calculate 1st and 3rt quartile as: df.quantile([.25, .75]) val 0.25 20.0 0.75 23.5 I would have expected that from 9 values bellow median that 1st quartile should be 19, but as you can see above, python says it is 20. Similarly, for

How to output different 25th, 50th, 75th percentiles in single Teradata query?

。_饼干妹妹 提交于 2019-12-11 08:38:08
问题 I had got stuck few hours back on around something similar and worked out a less messy code for outputting 25th, 50th, 75th percentiles in a single Teradata query. Can be further extended to produce a " 5 point summary ". For minimum and maximum change static values according to your population estimate. Somewhere someone had asked for an elegant approach. Sharing mine. Here's the code: SELECT MAX(PER_MIN) AS PER_MIN, MAX(PER_25) AS PER_25, MAX(PER_50) AS PER_50, MAX(PER_75) AS PER_75, MAX

Continuous quantiles of a scatterplot

有些话、适合烂在心里 提交于 2019-12-10 19:31:45
问题 I have a data set, for which I graphed a regression (using ggplot2 's stat_smooth ) : ggplot(data = mydf, aes(x=time, y=pdm)) + geom_point() + stat_smooth(col="red") I'd also like to have the quantiles (if it's simpler, having only the quartiles will do) using the same method. All I manage to get is the following : ggplot(data = mydf, aes(x=time, y=pdm, z=surface)) + geom_point() + stat_smooth(col="red") + stat_quantile(quantiles = c(0.25,0.75)) Unfortunately, I can't put method="loess" in

Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

佐手、 提交于 2019-12-07 13:22:55
问题 I'm currently trying to neatly cut data with use of the Hmisc package, as in the example below: dummy <- data.frame(important_variable=seq(1:1000)) require(Hmisc) dummy$cuts <- cut2(dummy$important_variable, g = 4) The produced cuts are correct with respect to the values: important_variable cuts 1 1 [ 1, 251) 2 2 [ 1, 251) 3 3 [ 1, 251) 4 4 [ 1, 251) 5 5 [ 1, 251) 6 6 [ 1, 251) > table(dummy$cuts) [ 1, 251) [251, 501) [501, 751) [751,1000] 250 250 250 250 However, I would like for the data to

Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

牧云@^-^@ 提交于 2019-12-05 19:06:00
I'm currently trying to neatly cut data with use of the Hmisc package, as in the example below: dummy <- data.frame(important_variable=seq(1:1000)) require(Hmisc) dummy$cuts <- cut2(dummy$important_variable, g = 4) The produced cuts are correct with respect to the values: important_variable cuts 1 1 [ 1, 251) 2 2 [ 1, 251) 3 3 [ 1, 251) 4 4 [ 1, 251) 5 5 [ 1, 251) 6 6 [ 1, 251) > table(dummy$cuts) [ 1, 251) [251, 501) [501, 751) [751,1000] 250 250 250 250 However, I would like for the data to be presented slightly differently. For instance instead of [ 1, 251 ) [ 251, 501 ) I would prefer the

Trying to calculate quartiles in MDX

心不动则不痛 提交于 2019-12-02 08:28:50
My data looks like this: ID |PersonID |CompanyID |DateID |Throughput |AmountType 33F467AC-F35B-4F24-A05B-FC35CF005981 |7 |53 |200802 |3 |0 04EE0FF0-511D-48F5-AA58-7600B3A69695 |18 |4 |201309 |5 |0 AB058AA5-6228-4E7C-9469-55827A5A34C3 |25 |69 |201108 |266 |0 with around a million rows. The columns names *ID refers to other tables, so they can be used as dimensions. I have an OLAP cube with the column Throughput as Measure and the rest as dimensions. I want to calculate Quartile 1 and 3 of the Throughput measure. I followed this guide: https://electrovoid.wordpress.com/2011/06/24/ssas-quartile/