Why does “vectorizing” this simple R loop give a different result?

前端 未结 4 927
故里飘歌
故里飘歌 2021-02-12 21:54

Perhaps a very dumb question.

I am trying to \"vectorize\" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.2         


        
4条回答
  •  孤街浪徒
    2021-02-12 21:57

    This has nothing to do with memory block aliasing (a term I have never encountered before). Take a particular permutation example and walk through the assignments that would occur regardless of the implementation at the C or assembly (or whatever) language level; It intrinsic to how any sequential for-loop would behave versus how any "true" permutation (what one gets with x[sig]) would occur:

    sample(10)
     [1]  3  7  1  5  6  9 10  8  4  2
    
    value at 1 goes to 3, and now there are two of those values
    value at 2 goes to 7, and now there are two of those values
    value at 3 (which was at 1) now goes back to 1 but the values remain unchanged
    

    ... can continue but this illustrates how this will usually not be a "true" permutation and very uncommonly would result in a complete redistribution of values. I'm guessing that only a completely ordered permutation (of which I think there is only one, i.e. 10:1) could result in a new set of x's that were unique.

    replicate( 100, {x <- round(runif(10), 2); 
                      sig <- sample.int(10); 
                      for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                      sum(duplicated(x)) } )
     #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
     #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
     #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5
    

    I started wondering what the distribution of duplication counts might be in a large series. Looks pretty symmetric:

    table( replicate( 1000000, {x <- round(runif(10), 5); 
                                sig <- sample.int(10); 
                   for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                                sum(duplicated(x)) } ) )
    
         0      1      2      3      4      5      6      7      8 
         1    269  13113 126104 360416 360827 125707  13269    294 
    

提交回复
热议问题