How to find bin edges of given bin number returned by scipy.stats.binned_statistic_dd()?

自闭症网瘾萝莉.ら 提交于 2019-12-06 15:49:07

It helps to first be familiar with np.unravel_index. It converts a "flat index" (i.e. binnumber!) to a tuple of coordinates. You can think of the flat index as the index into arr.ravel(), and the tuple of coordinates as the index into arr. For example, if in the diagram below we think of the numbers 0,1,2,3,4,5 as bin numbers:

   | 0 | 1 | 2 |
---+---+---+---|
 0 | 0 | 1 | 2 |
 1 | 3 | 4 | 5 |
   +---+---+---|

then np.unravel_index(4, (2,3))

In [65]: np.unravel_index(4, (2,3))
Out[65]: (1, 1)

equals (1,1) because the 4th bin number in an array of shape (2,3) has coordinate (1,1).

Okay then. Next, we need to know that internally scipy.stats.binned_statistic_dd adds two edges to the given bin edges to handle outliers:

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]

So the edge coordinates corresponding to the bin numbers is given by

edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])

(We use len(edge)-1 because the shape of the array axis is one less than the number of edges.)


For example:

import itertools as IT
import numpy as np
import scipy.stats as stats

sample = np.array(list(IT.product(np.arange(5)-0.5, 
                                  np.arange(5)*10-5, 
                                  np.arange(5)*100-50)))
bins = [np.arange(4),
        np.arange(4)*10,
        np.arange(4)*100]

statistic, bin_edges, binnumber = stats.binned_statistic_dd(
    sample=sample, values=sample, statistic='count', 
    bins=bins, 
    range=[(0,100)]*3)

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])


for samp, idx in zip(sample, zip(*edge_index)):
    vert = [edge[i] for i, edge in zip(idx, bin_edges)]
    print('{} goes in bin with left-most corner: {}'.format(samp, vert))

yields

[ -0.5  -5.  -50. ] goes in bin with left-most corner: [-inf, -inf, -inf]
[ -0.5  -5.   50. ] goes in bin with left-most corner: [-inf, -inf, 0.0]
[  -0.5   -5.   150. ] goes in bin with left-most corner: [-inf, -inf, 100.0]
[  -0.5   -5.   250. ] goes in bin with left-most corner: [-inf, -inf, 200.0]
...
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!