Counting non zero values in each column of a dataframe in python

前端未结

关注

 3  1141

I have a python-pandas-dataframe in which first column is user_id and rest of the columns are tags(tag_0 to tag_122). I have the data in the following format:

相关标签:

3条回答

时光说笑

2020-12-07 22:59
Why not use np.count_nonzero?
1. To count the number of non-zeros of an entire dataframe, np.count_nonzero(df)
2. To count the number of non-zeros of all rows np.count_nonzero(df, axis=0)
3. To count the number of non-zeros of all columns np.count_nonzero(df, axis=1)
It works with dates too.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2020-12-07 23:10
My favorite way of getting number of nonzeros in each column is
```
df.astype(bool).sum(axis=0)
```
For the number of non-zeros in each row use
```
df.astype(bool).sum(axis=1)
```
(Thanks to Skulas)

If you have nans in your df you should make these zero first, otherwise they will be counted as 1.
```
df.fillna(0).astype(bool).sum(axis=1)
```
(Thanks to SirC)
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-12-07 23:17
To count nonzero values, just do (column!=0).sum(), where column is the data you want to do it for. column != 0 returns a boolean array, and True is 1 and False is 0, so summing this gives you the number of elements that match the condition.

So to get your desired result, do
```
df.groupby('user_id').apply(lambda column: column.sum()/(column != 0).sum())
```
0 讨论(0)
发布评论:

提交评论
- 加载中...