First non-null value per row from a list of Pandas columns

前端 未结 9 1178
难免孤独
难免孤独 2020-11-27 19:23

If I\'ve got a DataFrame in pandas which looks something like:

    A   B   C
0   1 NaN   2
1 NaN   3 NaN
2 NaN   4   5
3 NaN NaN NaN

How ca

相关标签:
9条回答
  • 2020-11-27 20:07
    df=pandas.DataFrame({'A':[1, numpy.nan, numpy.nan, numpy.nan], 'B':[numpy.nan, 3, 4, numpy.nan], 'C':[2, numpy.nan, 5, numpy.nan]})
    
    df
         A    B    C
    0  1.0  NaN  2.0
    1  NaN  3.0  NaN
    2  NaN  4.0  5.0
    3  NaN  NaN  NaN
    
    df.apply(lambda x: numpy.nan if all(x.isnull()) else x[x.first_valid_index()], axis=1).tolist()
    [1.0, 3.0, 4.0, nan]
    
    0 讨论(0)
  • 2020-11-27 20:11

    This is nothing new, but it's a combination of the best bits of @yangie's approach with a list comprehension, and @EdChum's df.apply approach that I think is easiest to understand.

    First, which columns to we want to pick our values from?

    In [95]: pick_cols = df.apply(pd.Series.first_valid_index, axis=1)
    
    In [96]: pick_cols
    Out[96]: 
    0       A
    1       B
    2       B
    3    None
    dtype: object
    

    Now how do we pick the values?

    In [100]: [df.loc[k, v] if v is not None else None 
        ....:     for k, v in pick_cols.iteritems()]
    Out[100]: [1.0, 3.0, 4.0, None]
    

    This is ok, but we really want the index to match that of the original DataFrame:

    In [98]: pd.Series({k:df.loc[k, v] if v is not None else None
       ....:     for k, v in pick_cols.iteritems()})
    Out[98]: 
    0     1
    1     3
    2     4
    3   NaN
    dtype: float64
    
    0 讨论(0)
  • 2020-11-27 20:11

    groupby in axis=1

    If we pass a callable that returns the same value, we group all columns together. This allows us to use groupby.agg which gives us the first method that makes this easy

    df.groupby(lambda x: 'Z', 1).first()
    
         Z
    0  1.0
    1  3.0
    2  4.0
    3  NaN
    

    This returns a dataframe with the column name of the thing I was returning in my callable


    lookup, notna, and idxmax

    df.lookup(df.index, df.notna().idxmax(1))
    
    array([ 1.,  3.,  4., nan])
    

    argmin and slicing

    v = df.values
    v[np.arange(len(df)), np.isnan(v).argmin(1)]
    
    array([ 1.,  3.,  4., nan])
    
    0 讨论(0)
提交回复
热议问题