xgboost predict method returns the same predicted value for all rows

后端未结

关注

 6  2180

I\'ve created an xgboost classifier in Python:

train is a pandas dataframe with 100k rows and 50 features as columns. target is a pandas series

xgb_cla


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2020-12-09 13:55
              
            
            
                                                                       
You need to post a reproducible example for any real investigation. It's entirely likely that your response target is highly unbalanced and that your training data is not super predictive, thus you always (or almost always) get one class predicted. Have you looked at the predicted probabilities at all to see if there is any variance? Is it just an issue of not using the proper cut-off for classification labels?

Since you said that a RF gave reasonable predictions it would useful to see your training parameters for that. At a glance, it's curious why you're using a regression objective function in your xgboost call though -- that could easily be why you are seeing such poor performance. Trying changing your objective to: 'binary:logistic.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-09 13:59
              
            
            
                                                                       
This question has received several responses including on this thread as well as here and here.

I was having a similar issue with both XGBoost and LGBM.  For me, the solution was to increase the size of the training dataset.  

I was training on a local machine using a random sample (~0.5%) of a large sparse dataset (200,000 rows and 7000 columns) because I did not have enough local memory for the algorithm.  It turned out that for me, the array of predicted values was just an array of the average values of the target variable.  This suggests to me that the model may have been underfitting.  One solution to an underfitting model is to train your model on more data, so I tried my analysis on a machine with more memory and the issue was resolved:  my prediction array was no longer an array of average target values.  On the other hand, the issue could simply have been that the slice of predicted values I was looking at were predicted from training data with very little information (e.g. 0's and nan's).  For training data with very little information, it seems reasonable to predict the average value of the target feature.

None of the other suggested solutions I came across were helpful for me.  To summarize some of the suggested solutions included:
1) check if gamma is too high
2) make sure your target labels are not included in your training dataset
3) max_depth may be too small.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南方客        
                
              
                            
                2020-12-09 14:00
              
            
            
                                                                       
One of the reasons for the same is that you're providing a high penalty through parameter gamma. Compare the mean value of your training response variable and check if the prediction is close to this. If yes then the model is restricting too much on the prediction to keep train-rmse and val-rmse as close as possible. Your prediction is the simplest with higher value of gamma. So you'll get the simplest model prediction like mean of training set as prediction or naive prediction.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2020-12-09 14:00
              
            
            
                                                                       
I have tried all solutions on this page, but none worked.
As I was grouping time series, certain frequencies created gaps in data.
I solved this issue by filling all NaN's.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-09 14:03
              
            
            
                                                                       
Won't the max_depth =3 too smaller, try to get it bigger,the default value is 7 if i remember it correctly. and set silent to be 1, then you can monitor what's the error each epochs
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-09 14:18
              
            
            
                                                                       
Probably the hyper-parameters you use cause errors. Try using default values. In my case, this problem was solved by removing subsample and min_child_weight hyper-parameters from params.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复