Get count unique values in a row in pandas

前端 未结 4 1487
悲&欢浪女
悲&欢浪女 2021-01-22 19:57

Suppose I have the following data frame:

0     1        2
new   NaN      NaN
new   one      one
a     b        c
NaN   NaN      NaN

How would I

相关标签:
4条回答
  • 2021-01-22 20:42

    It is not as fast as coldspeed's answer with set(), but you could also do

    df['_num_unique_values'] = df.T.nunique()
    

    First the transpose of df dataframe is taken with df.T and then nunique() is used to get the count of unique values excluding NaNs.

    This is added as a new column to the original dataframe.

    df would now be

        0   1   2   _num_unique_values
    0   new nan nan 1
    1   new one one 2
    2   a   b   c   3
    3   nan nan nan 0
    
    0 讨论(0)
  • 2021-01-22 20:48

    A more abstract solution:

    df['num_uniq']=df.nunique(axis=1)
    
    0 讨论(0)
  • 2021-01-22 20:57

    Use a list comprehension.... with set:

    df['num_uniq'] = [len(set(v[pd.notna(v)].tolist())) for v in df.values]
    df
    
         0    1    2  num_uniq
    0  new  NaN  NaN         1
    1  new  one  one         2
    2    a    b    c         3
    3  NaN  NaN  NaN         0
    

    You could do this with stack, groupby and nunique.

    # df.join(df.stack().groupby(level=0).nunique().to_frame('num_uniq'))
    df['num_uniq'] = df.stack().groupby(level=0).nunique()
    df
    
         0    1    2  num_uniq
    0  new  NaN  NaN       1.0
    1  new  one  one       2.0
    2    a    b    c       3.0
    3  NaN  NaN  NaN       NaN
    

    Yet another option is apply and nunique:

    df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
    df
    
         0    1    2  num_uniq
    0  new  NaN  NaN         1
    1  new  one  one         2
    2    a    b    c         3
    3  NaN  NaN  NaN         0
    

    Performance

    df_ = df
    df = pd.concat([df_] * 1000, ignore_index=True)
    
    %timeit df['num_uniq'] = [len(set(v[pd.notna(v)])) for v in df.values]
    %timeit df['num_uniq'] = df.stack().groupby(level=0).nunique()
    %timeit df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
    %timeit df['num_uniq'] = df.nunique(1)
    
    196 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    6.34 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    679 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    3.21 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    0 讨论(0)
  • 2021-01-22 20:59

    Just use nunique(axis=1).

    import numpy as np
    import pandas as pd
    
    data={0:['new','new','a',np.nan],
         1:[np.nan,'one','b', np.nan],
         2:[np.nan,np.nan,'c',np.nan]}
    df = pd.DataFrame(data)
    
    print(df.nunique(axis=1))
    
    df['num_unique'] = df.nunique(axis=1)
    

    See: enter image description here

    0 讨论(0)
提交回复
热议问题