How to add a new column without Length Value error if match where/mask condition after groupby in python/pandas?

后端未结

关注

 1  576

悲哀的现实 2021-01-26 08:43

I am trying to filter out inner data in my large data frame(1400,000 rows).
This is a very short and easy version of sample data,

a      b        c       dt


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   再見小時候
                                             
                
                
                (楼主)
            
              
              
                2021-01-26 09:07
              

            
            
                        
If you only want a new column with the values that match with mask, as @Quang Hoang said, you could try this:
import pandas as pd
import io
s_e='''
a      b        c       dt                   e
35   0.1      234   2020/6/15 14:27:00       0
1    0.1      554   2020/6/15 15:28:00       1
2    0.2      654   2020/6/15 16:29:00       0
23   0.4      2345  2020/6/15 17:26:00       0
34   0.8      245   2020/6/15 18:25:00       0
8    0.9      123   2020/6/15 18:26:00       0
7    0.1      22    2020/6/15 18:27:00       0
2    0.3      99    2020/6/15 18:28:00       0
219  0.2      17    2020/6/15 19:26:00       0
'''
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', parse_dates=[3], engine='python')
print(df)
# masking the first set of conditions:
mask = (df['a'].lt(25) & df['a'].gt(10) ) | df['b'].gt(0.2) | df['c'].gt(500)
mask = mask & df['e'].eq(0)

#Quang Hoang recomendation:
df['indicator'] = mask.astype(int) 

print(df)

Output:
df
     a    b     c                  dt  e  indicator
0   35  0.1   234 2020-06-15 14:27:00  0          0
1    1  0.1   554 2020-06-15 15:28:00  1          0
2    2  0.2   654 2020-06-15 16:29:00  0          1
3   23  0.4  2345 2020-06-15 17:26:00  0          1
4   34  0.8   245 2020-06-15 18:25:00  0          1
5    8  0.9   123 2020-06-15 18:26:00  0          1
6    7  0.1    22 2020-06-15 18:27:00  0          0
7    2  0.3    99 2020-06-15 18:28:00  0          1
8  219  0.2    17 2020-06-15 19:26:00  0          0

If you want to indicate only the rows that match with the mask and also with min c values by 30 mins, you could try:
import pandas as pd
import io
s_e='''
a      b        c       dt                   e
35   0.1      234   2020/6/15 14:27:00       0
1    0.1      554   2020/6/15 15:28:00       1
2    0.2      654   2020/6/15 16:29:00       0
23   0.4      2345  2020/6/15 17:26:00       0
34   0.8      245   2020/6/15 18:25:00       0
8    0.9      123   2020/6/15 18:26:00       0
7    0.1      22    2020/6/15 18:27:00       0
2    0.3      99    2020/6/15 18:28:00       0
219  0.2      17    2020/6/15 19:26:00       0
'''
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', parse_dates=[3], engine='python')
print(df)
# masking the first set of conditions:
mask = (df['a'].lt(25) & df['a'].gt(10) ) | df['b'].gt(0.2) | df['c'].gt(500)
mask = mask & df['e'].eq(0)
df['indicator'] = [0]*len(df)

temp = df[mask] 
#select rows with min `c` values by 30 mins
c_min = temp.groupby(temp['dt'].dt.floor('30min'))['c'].idxmin()

# final df
df.loc[c_min,'indicator']=1
print(df)

Output:
df
     a    b     c                  dt  e  indicator
0   35  0.1   234 2020-06-15 14:27:00  0          0
1    1  0.1   554 2020-06-15 15:28:00  1          0
2    2  0.2   654 2020-06-15 16:29:00  0          1
3   23  0.4  2345 2020-06-15 17:26:00  0          1
4   34  0.8   245 2020-06-15 18:25:00  0          0
5    8  0.9   123 2020-06-15 18:26:00  0          0
6    7  0.1    22 2020-06-15 18:27:00  0          0
7    2  0.3    99 2020-06-15 18:28:00  0          1
8  219  0.2    17 2020-06-15 19:26:00  0          0

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复