I have a pandas.DataFrame
that looks like this.
COL1 COL2 COL3
C1 None None
C1 C2 None
C1 C1 None
C1 C2
You could apply value_counts
:
In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]:
C1 C2 C3 None
0 1 NaN NaN 2
1 1 1 NaN 1
2 2 NaN NaN 1
3 1 1 1 NaN
So you can fill the NaN and applend just the base values you want:
In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).
Andy's answer is spot on.
I'm adding this answer, if C1,C2...Cn list is huge and we want to view only subset of them.
dff = df.copy()
dff['C1']=(df == 'C1').T.sum()
dff['C2']=(df == 'C2').T.sum()
dff['C3']=(df == 'C3').T.sum()
dff
COL1 COL2 COL3 C1 C2 C3
0 C1 None None 1 0 0
1 C1 C2 None 1 1 0
2 C1 C1 None 2 0 0
3 C1 C2 C3 1 1 1
Usually apply
+ serise
function to whole dataframe will slowing down the whole process , Additional Reading : Link
df.mask(df.eq('None')).stack().str.get_dummies().sum(level=0)
Out[165]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
Or you can do with Counter
from collections import Counter
pd.DataFrame([ Counter(x) for x in df.values]).drop('None',1)
Out[170]:
C1 C2 C3
0 1 NaN NaN
1 1 1.0 NaN
2 2 NaN NaN
3 1 1.0 1.0