问题
My data looks like this (ch
= channel, det
= detector):
ch det time counts
1 1 0 123
2 0 121
3 0 125
2 1 0 212
2 0 210
3 0 210
1 1 1 124
2 1 125
3 1 123
2 1 1 210
2 1 209
3 1 213
Note, in reality, the time column is a float
with 12 or so significant digits, still constant for all detectors of 1 measurement, but its value is not predictable, nor in a sequence.
What I need to create is a data frame that looks like this:
c time mean_counts_over_detectors
1 0 xxx
2 0 yyy
1 1 zzz
1 1 www
I.e., I would like to apply np.mean
over all counts of the detectors of 1 channel at each time separately. I could write kludgy loops, but I feel that pandas must have something built-in for this. I am still a beginner at pandas, and especially with MultiIndex there are so many concepts, I am not sure what I should be looking for in the docs.
The title contains 'condition' because I thought that maybe the fact that I want the mean over all detectors of one channel for the counts where the time is the same can be expressed as a slicing condition.
回答1:
Same as @meteore but with a MultiIndex.
In [55]: df
Out[55]:
counts
ch det time
1 1 0 123
2 0 121
3 0 125
2 1 0 212
2 0 210
3 0 210
1 1 1 124
2 1 125
3 1 123
2 1 1 210
2 1 209
3 1 213
In [56]: df.index
Out[56]:
MultiIndex
[(1L, 1L, 0L) (1L, 2L, 0L) (1L, 3L, 0L) (2L, 1L, 0L) (2L, 2L, 0L)
(2L, 3L, 0L) (1L, 1L, 1L) (1L, 2L, 1L) (1L, 3L, 1L) (2L, 1L, 1L)
(2L, 2L, 1L) (2L, 3L, 1L)]
In [57]: df.index.names
Out[57]: ['ch', 'det', 'time']
In [58]: df.groupby(level=['ch', 'time']).mean()
Out[58]:
counts
ch time
1 0 123.000000
1 124.000000
2 0 210.666667
1 210.666667
Be carefull with floats & groupby (this is independent of a MultiIndex or not), groups can differ due to numerical representation/accuracy-limitations related to floats.
回答2:
Not using MultiIndexes (if you have them, you can get rid of them through df.reset_index()
):
chans = [1,1,1,2,2,2,1,1,1,2,2,2]
df = pd.DataFrame(dict(ch=chans, det=[1,2,3,1,2,3,1,2,3,1,2,3], time=6*[0]+6*[1], counts=np.random.randint(0,500,12)))
Use groupby
and mean
as an aggregation function:
>>> df.groupby(['time', 'ch'])['counts'].mean()
time ch
0 1 315.000000
2 296.666667
1 1 178.333333
2 221.666667
Name: counts
Other aggregation functions can be passed via agg
:
>>> df.groupby(['time', 'ch'])['counts'].agg(np.ptp)
来源:https://stackoverflow.com/questions/13119515/how-to-apply-condition-on-level-of-pandas-multiindex