Counting non zero values in each column of a dataframe in python

前端 未结 3 1141
温柔的废话
温柔的废话 2020-12-07 22:54

I have a python-pandas-dataframe in which first column is user_id and rest of the columns are tags(tag_0 to tag_122). I have the data in the following format:



        
相关标签:
3条回答
  • 2020-12-07 22:59

    Why not use np.count_nonzero?

    1. To count the number of non-zeros of an entire dataframe, np.count_nonzero(df)
    2. To count the number of non-zeros of all rows np.count_nonzero(df, axis=0)
    3. To count the number of non-zeros of all columns np.count_nonzero(df, axis=1)

    It works with dates too.

    0 讨论(0)
  • 2020-12-07 23:10

    My favorite way of getting number of nonzeros in each column is

    df.astype(bool).sum(axis=0)
    

    For the number of non-zeros in each row use

    df.astype(bool).sum(axis=1)
    

    (Thanks to Skulas)

    If you have nans in your df you should make these zero first, otherwise they will be counted as 1.

    df.fillna(0).astype(bool).sum(axis=1)
    

    (Thanks to SirC)

    0 讨论(0)
  • 2020-12-07 23:17

    To count nonzero values, just do (column!=0).sum(), where column is the data you want to do it for. column != 0 returns a boolean array, and True is 1 and False is 0, so summing this gives you the number of elements that match the condition.

    So to get your desired result, do

    df.groupby('user_id').apply(lambda column: column.sum()/(column != 0).sum())
    
    0 讨论(0)
提交回复
热议问题