statistics for histogram of periodic data

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-09 20:07:35

问题


For a series of angle values in (-pi, pi) range, I make a histogram. Is there an effective way to calculate a mean and modal (post probable) value? Consider following examples:

import numpy as N, cmath
deg = N.pi/180.
d = N.array([-175., 170, 175, 179, -179])*deg
i = N.sum(N.exp(1j*d))
ave = cmath.phase(i)
i /= float(d.size)
stdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))

print ave/deg, stdev/deg

Now, let's have a histogram:

counts, bins = N.histogram(data, N.linspace(-N.pi, N.pi, 360))

Is it possible to calculate mean, mode having counts and bins? For non-periodic data, calculation of a mean is straightforward:

ave = sum(counts*bins[:-1])

Calculations of a modal value requires more effort. Actually, I'm not sure my code below is correct: firstly, I identify bins which occur most frequently and then I calculate an arithmetic mean:

cmax = bins[N.argmax(counts)]
mode = N.mean(N.take(bins, N.nonzero(counts == cmax)[0]))

I have no idea, how to calculate standard deviation from such data, though. One obvious solution to all my problems (at least those described above) is to convert histogram data to a data series and then use it in calculations. This is not elegant, however, and inefficient.

Any hints will be very appreciated.


This is the partial solution I wrote.

import numpy as N, cmath
import scipy.stats as ST

d = [-175, 170.2, 175.57, 179, -179, 170.2, 175.57, 170.2]
deg = N.pi/180.
data = N.array(d)*deg

i = N.sum(N.exp(1j*data))
ave = cmath.phase(i)  # correct and exact mean for periodic data
wrong_ave = N.mean(d)

i /= float(data.size)
stdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
wrong_stdev = N.std(d)

bins = N.linspace(-N.pi, N.pi, 360)
counts, bins = N.histogram(data, bins, normed=False)
# consider it weighted vector addition
nz = N.nonzero(counts)[0]
weight = counts[nz]
i = N.sum(weight * N.exp(1j*bins[nz])/len(nz))
pave = cmath.phase(i)  # correct and approximated mean for periodic data
i /= sum(weight)/float(len(nz))
pstdev = -2. * N.log(N.sqrt(i.real**2 + i.imag**2))
print
print 'scipy: %12.3f (mean) %12.3f (stdev)' % (ST.circmean(data)/deg, \
                                               ST.circstd(data)/deg)

When run, it gives following results:

 mean:      175.840       85.843      175.360
stdev:        0.472      151.785        0.430

scipy:      175.840 (mean)        3.673 (stdev)

A few comments now: the first column gives mean/stdev calculated. As can be seen, the mean agrees well with scipy.stats.circmean (thanks JoeKington for pointing it out). Unfortunately stdev differs. I will look at it later. The second column gives completely wrong results (non-periodic mean/std from numpy obviously does not work here). The 3rd column gives sth I wanted to obtain from the histogram data (@JoeKington: my raw data won't fit memory of my computer.., @dmytro: thanks for your input: of course, bin size will influence result but in my application I don't have much choice, i.e. I have to reduce data somehow). As can be seen, the mean (3rd column) is properly calculated, stdev needs further attention :)


回答1:


Have a look at scipy.stats.circmean and scipy.stats.circstd.

Or do you only have the histogram counts, and not the "raw" data? If so, you could fit a Von Mises distribution to your histogram counts and approximate the mean and stddev in that way.




回答2:


Here's how to get an approximation.

Since Var(x) = <x^2> - <x>^2, we have:

meanX = N.sum(counts * bins[:-1]) / N.sum(counts)
meanX2 = N.sum(counts * bins[:-1]**2) / N.sum(counts)
std = N.sqrt(meanX2 - meanX**2)


来源:https://stackoverflow.com/questions/10269129/statistics-for-histogram-of-periodic-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!