Why does “vectorizing” this simple R loop give a different result?

前端未结

关注

 4  927

故里飘歌 2021-02-12 21:54

Perhaps a very dumb question.

I am trying to \"vectorize\" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.2


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   孤街浪徒
                                             
                
                
                (楼主)
            
              
              
                2021-02-12 21:57
              

            
            
                        
This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:

sample(10)
 [1]  3  7  1  5  6  9 10  8  4  2

value at 1 goes to 3, and now there are two of those values
value at 2 goes to 7, and now there are two of those values
value at 3 (which was at 1) now goes back to 1 but the values remain unchanged


... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.

replicate( 100, {x <- round(runif(10), 2); 
                  sig <- sample.int(10); 
                  for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                  sum(duplicated(x)) } )
 #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
 #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
 #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5


I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:

table( replicate( 1000000, {x <- round(runif(10), 5); 
                            sig <- sample.int(10); 
               for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                            sum(duplicated(x)) } ) )

     0      1      2      3      4      5      6      7      8 
     1    269  13113 126104 360416 360827 125707  13269    294 

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复