问题
I have the following sample dataframe in Python pandas:
+---+------+------+------+
| | col1 | col2 | col3 |
+---+------+------+------+
| 0 | a | d | b |
+---+------+------+------+
| 1 | a | c | b |
+---+------+------+------+
| 2 | c | b | c |
+---+------+------+------+
| 3 | b | b | c |
+---+------+------+------+
| 4 | a | a | d |
+---+------+------+------+
I would like to perform a count of all the 'a,' 'b,' 'c,' and 'd' values across columns 1-3 so that I would end up with a dataframe like this:
+---+--------+-------+
| | letter | count |
+---+--------+-------+
| 0 | a | 4 |
+---+--------+-------+
| 1 | b | 5 |
+---+--------+-------+
| 2 | c | 4 |
+---+--------+-------+
| 3 | d | 2 |
+---+--------+-------+
One way I can do this is stack the columns on top of each other and THEN do a groupby count, but I feel like there has to be a better way. Can someone help me with this?
回答1:
You can stack()
the dataframe to put all columns into rows and then do value_counts
:
df.stack().value_counts()
b 5
c 4
a 4
d 2
dtype: int64
回答2:
You can apply value_counts with sum:
print (df.apply(pd.value_counts))
col1 col2 col3
a 3.0 1 NaN
b 1.0 2 2.0
c 1.0 1 2.0
d NaN 1 1.0
df1 = df.apply(pd.value_counts).sum(1).reset_index()
df1.columns = ['letter','count']
df1['count'] = df1['count'].astype(int)
print (df1)
letter count
0 a 4
1 b 5
2 c 4
3 d 2
来源:https://stackoverflow.com/questions/37911043/how-to-groupby-count-across-multiple-columns-in-pandas