How should the interquartile range be calculated in Python?

时间秒杀一切 提交于 2019-12-21 05:04:11

问题


I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and lower quartiles. I have attempted to calculate the interquartile range using NumPy functions and using Wolfram Alpha. I find all of the answers, from my manual one, to the NumPy one, tothe Wolfram Alpha, to be different. I do not know why this is.

My attempt in Python is as follows:

>>> a = numpy.array([1, 2, 3, 4, 5, 6, 7])
>>> numpy.percentile(a, 25)
2.5
>>> numpy.percentile(a, 75)
5.5
>>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # IQR
3.0

My attempt in Wolfram Alpha is as follows:

  • "first quartile 1, 2, 3, 4, 5, 6, 7": 2.25
  • "third quartile 1, 2, 3, 4, 5, 6, 7": 5.75
  • (comment: 5.75 - 2.25 = 3.5)
  • "interquartile range 1, 2, 3, 4, 5, 6, 7": ~3.5

So, I find that the values returned by NumPy and Wolfram Alpha for what I think are the first quartile, the third quartile and the interquartile range are not consistent. Why is this? What should I be doing in Python to calculate the interquartile range correctly?

As far as I am aware, the interquartile range of [1, 2, 3, 4, 5, 6, 7] should be the following:

median(5, 6, 7) - median(1, 2, 3) = 4.

回答1:


You have 7 numbers which you are attempting to split into quartiles. Because 7 is not divisible by 4 there are a couple of different ways to do this as mentioned here.

Your way is the first given by that link, wolfram alpha seems to be using the third. Numpy is doing basically the same thing as wolfram however its interpolating based on percentiles (as shown here) rather than quartiles so its getting a different answer. You can choose how numpy handles this using the interpolation option (I tried to link to the documentation but apparently I'm only allowed two links per post).

You'll have to choose which definition you prefer for your application.




回答2:


Version 1.9 of numpy features a handy 'interpolation' argument to help you get to 4.

a = numpy.array([1, 2, 3, 4, 5, 6, 7])
numpy.percentile(a, 75, interpolation='higher') - numpy.percentile(a, 25, interpolation='lower')



回答3:


Not perfect but these functions should approximate it:

def quartile_1(l):
    return sorted(l)[int(len(l) * .25)]

def median(l):
    return sorted(l)[len(l)/2]

def quartile_3(l):
    return sorted(l)[int(len(l) * .75)]


来源:https://stackoverflow.com/questions/27472330/how-should-the-interquartile-range-be-calculated-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!