Pandas describe vs scipy.stats percentileofscore with NaN?

后端 未结 2 1532
被撕碎了的回忆
被撕碎了的回忆 2021-01-29 07:54

I\'m having a weird situation, where pd.describe is giving me percentile markers that disagree with scipy.stats percentileofscore, because of NaNs, I think.

My df is:

2条回答
  •  孤独总比滥情好
    2021-01-29 07:56

    the answer is very simple.

    There is no universally accepted formula for computing percentiles, in particular when your data contains ties or when it cannot be perfectly broken down in equal-size buckets.

    For instance, have a look at the documentation in R. There are more than seven types of formulas! https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html

    At the end, it comes down to understanding which formula is used and whether the differences are big enough to be a problem in your case.

提交回复
热议问题