I have a python-pandas-dataframe in which first column is user_id and rest of the columns are tags(tag_0 to tag_122). I have the data in the following format:
Why not use np.count_nonzero
?
np.count_nonzero(df)
np.count_nonzero(df, axis=0)
np.count_nonzero(df, axis=1)
It works with dates too.
My favorite way of getting number of nonzeros in each column is
df.astype(bool).sum(axis=0)
For the number of non-zeros in each row use
df.astype(bool).sum(axis=1)
(Thanks to Skulas)
If you have nans in your df you should make these zero first, otherwise they will be counted as 1.
df.fillna(0).astype(bool).sum(axis=1)
(Thanks to SirC)
To count nonzero values, just do (column!=0).sum()
, where column
is the data you want to do it for. column != 0
returns a boolean array, and True is 1 and False is 0, so summing this gives you the number of elements that match the condition.
So to get your desired result, do
df.groupby('user_id').apply(lambda column: column.sum()/(column != 0).sum())