问题
I have a Nx3 array mm. The function call
c,edg,idx = scipy.stats.binned_statistic_dd(mm,[], statistic='count',bins=(30,20,10),rg=((3,5),(2,8),(4,6)))
returns idx, which is a 1d array of ints that represents the bin in which each element of mm falls, and edg is a list of 3 arrays holding the bin edges
What I need is to find the bin edges of a given bin given it's binnumber in idx. For example, given idx=[24,153,...,72] I want to find the edges of say bin 153, i.e. where that bin falls in edg. Of course I can find the elements in bin 153 by mm[153], but not the edges.
I posted this Nx3 case just for clarity. In reality, I am looking for a solution to the NxD case.
回答1:
It helps to first be familiar with np.unravel_index. It converts a "flat index" (i.e. binnumber!) to a tuple of coordinates. You can think of the flat index as the index into arr.ravel()
, and the tuple of coordinates as the index into arr
. For example, if in the diagram below we think of the numbers 0,1,2,3,4,5 as bin numbers:
| 0 | 1 | 2 |
---+---+---+---|
0 | 0 | 1 | 2 |
1 | 3 | 4 | 5 |
+---+---+---|
then np.unravel_index(4, (2,3))
In [65]: np.unravel_index(4, (2,3))
Out[65]: (1, 1)
equals (1,1)
because the 4th bin number in an array of shape (2,3)
has coordinate (1,1)
.
Okay then. Next, we need to know that internally scipy.stats.binned_statistic_dd
adds two edges to the given bin edges to handle outliers:
bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
So the edge coordinates corresponding to the bin numbers is given by
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])
(We use len(edge)-1
because the shape of the array axis is one less than the
number of edges.)
For example:
import itertools as IT
import numpy as np
import scipy.stats as stats
sample = np.array(list(IT.product(np.arange(5)-0.5,
np.arange(5)*10-5,
np.arange(5)*100-50)))
bins = [np.arange(4),
np.arange(4)*10,
np.arange(4)*100]
statistic, bin_edges, binnumber = stats.binned_statistic_dd(
sample=sample, values=sample, statistic='count',
bins=bins,
range=[(0,100)]*3)
bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])
for samp, idx in zip(sample, zip(*edge_index)):
vert = [edge[i] for i, edge in zip(idx, bin_edges)]
print('{} goes in bin with left-most corner: {}'.format(samp, vert))
yields
[ -0.5 -5. -50. ] goes in bin with left-most corner: [-inf, -inf, -inf]
[ -0.5 -5. 50. ] goes in bin with left-most corner: [-inf, -inf, 0.0]
[ -0.5 -5. 150. ] goes in bin with left-most corner: [-inf, -inf, 100.0]
[ -0.5 -5. 250. ] goes in bin with left-most corner: [-inf, -inf, 200.0]
...
来源:https://stackoverflow.com/questions/28741853/how-to-find-bin-edges-of-given-bin-number-returned-by-scipy-stats-binned-statist