Why predicted polynomial changes drastically when only the resolution of prediction grid changes?

后端未结

关注

 2  1032

Why I have the exact same model, but run predictions on different grid size (by 0.001 vs by 0.01) getting different predictions?

set.seed(0)


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2021-01-20 19:41
              
            
            
                                                                       
In the first place, the predict lines don't fit the original data. You failed to make poly objs for prediction.

...
poly_ori <- poly(x, poly_df)    # important
...   

plot(x,y)

x_plt1 = seq(-1, 1, 0.001)
x_plt_exp1 = as.data.frame(poly(x_plt1, poly_df, coefs = attr(poly_ori, "coefs")))
lines(x_plt1, predict(fit, x_plt_exp1),lwd = 3, col = 2)

x_plt2 = seq(-1, 1, 0.01)
x_plt_exp2 = as.data.frame(poly(x_plt2, poly_df, coefs = attr(poly_ori, "coefs")))
lines(x_plt2, predict(fit, x_plt_exp2), lwd = 3, col = 3)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-01-20 19:44
              
            
            
                                                                       
This is a coding / programming problem as on my quick run I can't reproduce this with appropriate set-up by putting poly() inside model formula. So I think this question better suited for Stack Overflow.

## quick test ##

set.seed(0)
x <- runif(2000) - 0.5
y <- 0.1 * sin(x * 30) / x + runif(2000)
plot(x,y)

x_exp <- data.frame(x, y)
fit <- lm(y ~ poly(x, 5), data = x_exp)

x1 <- seq(-1, 1, 0.001)
y1 <- predict(fit, newdata = list(x = x1))
lines(x1, y1, lwd = 5, col = 2)

x2 <- seq(-1, 1, 0.01)
y2 <- predict(fit, newdata = list(x = x2))
lines(x2, y2, lwd = 2, col = 3)






cuttlefish44 has pointed out the fault in your implementation. When making prediction matrix, we want to use the construction information in model matrix, rather than constructing a new set of basis. If you wonder what such "construction information" is, perhaps you can go through this very long answer: How poly() generates orthogonal polynomials? How to understand the “coefs” returned?

Perhaps I can try making a brief summary and getting around that long, detailed answer.


The construction of orthogonal polynomial always starts from centring the input covariate values x. If this centre is different, then all the rest will be different. Now, this is the difference between poly(x, coef = NULL) and poly(x, coef = some_coefficients). The former will always construct a new set of basis using a new centre, while the latter, will use the existing centring information in some_coefficients to predict basis value on given set-up. Surely this is what we want when making prediction.
poly(x, coef = some_coefficients) will actually call predict.poly (which I explained in that long answer). It is relatively rare when we need to set coef argument ourselves, unless we are doing testing. If we set up the linear model using the way I present in my quick run above, predict.lm is smart enough to realize the correct way to predict poly model terms, i.e., internally it will do the poly(new_x, coef = some_coefficients) for us.
As an interesting contrast, ordinary polynomial don't have problem with this. For example, if you specify raw = TRUE in all poly() calls in your code, you will have no trouble. This is because raw polynomial has no construction information; it is just taking powers 1, 2, ... degree of x.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复