How to find bin edges of given bin number returned by scipy.stats.binned_statistic_dd()?

家住魔仙堡 提交于 2019-12-08 05:11:18

问题


I have a Nx3 array mm. The function call

c,edg,idx = scipy.stats.binned_statistic_dd(mm,[], statistic='count',bins=(30,20,10),rg=((3,5),(2,8),(4,6)))

returns idx, which is a 1d array of ints that represents the bin in which each element of mm falls, and edg is a list of 3 arrays holding the bin edges

What I need is to find the bin edges of a given bin given it's binnumber in idx. For example, given idx=[24,153,...,72] I want to find the edges of say bin 153, i.e. where that bin falls in edg. Of course I can find the elements in bin 153 by mm[153], but not the edges.

I posted this Nx3 case just for clarity. In reality, I am looking for a solution to the NxD case.


回答1:


It helps to first be familiar with np.unravel_index. It converts a "flat index" (i.e. binnumber!) to a tuple of coordinates. You can think of the flat index as the index into arr.ravel(), and the tuple of coordinates as the index into arr. For example, if in the diagram below we think of the numbers 0,1,2,3,4,5 as bin numbers:

   | 0 | 1 | 2 |
---+---+---+---|
 0 | 0 | 1 | 2 |
 1 | 3 | 4 | 5 |
   +---+---+---|

then np.unravel_index(4, (2,3))

In [65]: np.unravel_index(4, (2,3))
Out[65]: (1, 1)

equals (1,1) because the 4th bin number in an array of shape (2,3) has coordinate (1,1).

Okay then. Next, we need to know that internally scipy.stats.binned_statistic_dd adds two edges to the given bin edges to handle outliers:

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]

So the edge coordinates corresponding to the bin numbers is given by

edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])

(We use len(edge)-1 because the shape of the array axis is one less than the number of edges.)


For example:

import itertools as IT
import numpy as np
import scipy.stats as stats

sample = np.array(list(IT.product(np.arange(5)-0.5, 
                                  np.arange(5)*10-5, 
                                  np.arange(5)*100-50)))
bins = [np.arange(4),
        np.arange(4)*10,
        np.arange(4)*100]

statistic, bin_edges, binnumber = stats.binned_statistic_dd(
    sample=sample, values=sample, statistic='count', 
    bins=bins, 
    range=[(0,100)]*3)

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])


for samp, idx in zip(sample, zip(*edge_index)):
    vert = [edge[i] for i, edge in zip(idx, bin_edges)]
    print('{} goes in bin with left-most corner: {}'.format(samp, vert))

yields

[ -0.5  -5.  -50. ] goes in bin with left-most corner: [-inf, -inf, -inf]
[ -0.5  -5.   50. ] goes in bin with left-most corner: [-inf, -inf, 0.0]
[  -0.5   -5.   150. ] goes in bin with left-most corner: [-inf, -inf, 100.0]
[  -0.5   -5.   250. ] goes in bin with left-most corner: [-inf, -inf, 200.0]
...


来源:https://stackoverflow.com/questions/28741853/how-to-find-bin-edges-of-given-bin-number-returned-by-scipy-stats-binned-statist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!