First column name with non null value by row pandas

前端 未结 3 2147
青春惊慌失措
青春惊慌失措 2021-01-03 08:59

I want know the first year with incoming revenue for various projects.

Given the following, dataframe:

ID  Y1      Y2      Y3
0   NaN     8       4
1         


        
相关标签:
3条回答
  • 2021-01-03 09:23

    Avoiding apply is preferable as its not vectorized. The following is vectorized. It was tested with Pandas 1.1.

    Setup

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'Y1':[np.nan, np.nan, np.nan, 5],'Y2':[8, np.nan, np.nan, 3], 'Y3':[4, 1, np.nan, np.nan]})
    
    # df.dropna(how='all', inplace=True)  # Optional but cleaner
    
    # For ranking only:
    col_ranks = pd.DataFrame(index=df.columns, data=np.arange(1, 1 + len(df.columns)), columns=['first_notna_rank'], dtype='UInt8') # UInt8 supports max value of 255.
    

    To find the name of the first non-null column

    df['first_notna_name'] = df.dropna(how='all').notna().idxmax(axis=1).astype('string')
    

    If df has no rows with all nulls, dropna(how='all) above can be removed.

    To then find the first non-null value

    If df has no rows with all nulls:

    df['first_notna_value'] = df.lookup(row_labels=df.index, col_labels=df['first_notna_name'])
    

    If df may have rows with all nulls: (inefficient)

    df['first_notna_value'] = df.drop(columns='first_notna_name').bfill(axis=1).iloc[:, 0]
    

    To rank the name

    df = df.merge(col_ranks, how='left', left_on='first_notna_name', right_index=True)
    

    Is there a better way?

    Output

        Y1   Y2   Y3 first_notna_name  first_notna_value  first_notna_rank
    0  NaN  8.0  4.0               Y2                8.0                 2
    1  NaN  NaN  1.0               Y3                1.0                 3
    2  NaN  NaN  NaN             <NA>                NaN              <NA>
    3  5.0  3.0  NaN               Y1                5.0                 1
    

    Partial credit: answers by piRSquared and Andy

    0 讨论(0)
  • 2021-01-03 09:28

    Apply this code to a dataframe with only one row to return the first column in the row that contains a null value.

    row.columns[~(row.loc[:].isna()).all()][-1]

    0 讨论(0)
  • 2021-01-03 09:35

    You can apply first_valid_index to each row in the dataframe using a lambda expression with axis=1 to specify rows.

    >>> df.apply(lambda row: row.first_valid_index(), axis=1)
    ID
    0      Y2
    1      Y3
    2    None
    3      Y1
    dtype: object
    

    To apply it to your dataframe:

    df = df.assign(first = df.apply(lambda row: row.first_valid_index(), axis=1))
    
    >>> df
        Y1  Y2  Y3 first
    ID                  
    0  NaN   8   4    Y2
    1  NaN NaN   1    Y3
    2  NaN NaN NaN  None
    3    5   3 NaN    Y1
    
    0 讨论(0)
提交回复
热议问题