First column name with non null value by row pandas

前端未结

关注

 3  2147

I want know the first year with incoming revenue for various projects.

Given the following, dataframe:

ID  Y1      Y2      Y3
0   NaN     8       4
1


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2021-01-03 09:23
              
            
            
                                                                       
Avoiding apply is preferable as its not vectorized. The following is vectorized. It was tested with Pandas 1.1.
Setup
import numpy as np
import pandas as pd

df = pd.DataFrame({'Y1':[np.nan, np.nan, np.nan, 5],'Y2':[8, np.nan, np.nan, 3], 'Y3':[4, 1, np.nan, np.nan]})

# df.dropna(how='all', inplace=True)  # Optional but cleaner

# For ranking only:
col_ranks = pd.DataFrame(index=df.columns, data=np.arange(1, 1 + len(df.columns)), columns=['first_notna_rank'], dtype='UInt8') # UInt8 supports max value of 255.

To find the name of the first non-null column
df['first_notna_name'] = df.dropna(how='all').notna().idxmax(axis=1).astype('string')

If df has no rows with all nulls, dropna(how='all) above can be removed.
To then find the first non-null value
If df has no rows with all nulls:
df['first_notna_value'] = df.lookup(row_labels=df.index, col_labels=df['first_notna_name'])

If df may have rows with all nulls: (inefficient)
df['first_notna_value'] = df.drop(columns='first_notna_name').bfill(axis=1).iloc[:, 0]

To rank the name
df = df.merge(col_ranks, how='left', left_on='first_notna_name', right_index=True)

Is there a better way?
Output
    Y1   Y2   Y3 first_notna_name  first_notna_value  first_notna_rank
0  NaN  8.0  4.0               Y2                8.0                 2
1  NaN  NaN  1.0               Y3                1.0                 3
2  NaN  NaN  NaN             <NA>                NaN              <NA>
3  5.0  3.0  NaN               Y1                5.0                 1


Partial credit: answers by piRSquared and Andy
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2021-01-03 09:28
              
            
            
                                                                       
Apply this code to a dataframe with only one row to return the first column in the row that contains a null value.

row.columns[~(row.loc[:].isna()).all()][-1]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-01-03 09:35
              
            
            
                                                                       
You can apply first_valid_index to each row in the dataframe using a lambda expression with axis=1 to specify rows.

>>> df.apply(lambda row: row.first_valid_index(), axis=1)
ID
0      Y2
1      Y3
2    None
3      Y1
dtype: object


To apply it to your dataframe:

df = df.assign(first = df.apply(lambda row: row.first_valid_index(), axis=1))

>>> df
    Y1  Y2  Y3 first
ID                  
0  NaN   8   4    Y2
1  NaN NaN   1    Y3
2  NaN NaN NaN  None
3    5   3 NaN    Y1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复