Vectorize my thinking: Vector Operations in R

前端 未结 3 1761
醉酒成梦
醉酒成梦 2020-12-24 09:30

So earlier I answered my own question on thinking in vectors in R. But now I have another problem which I can\'t \'vectorize.\' I know vectors are faster and loops slower, b

相关标签:
3条回答
  • 2020-12-24 09:55

    Here's what seems like another very R-type way to generate the sums. Generate a vector that is as long as your input vector, containing nothing but the repeated sum of n elements. Then, subtract your original vector from the sums vector. The result: a vector (isums) where each entry is your original vector less the ith element.

    > (my.data$item[my.data$fixed==0])
    [1] 1 1 3 5 7
    > sums <- rep(sum(my.data$item[my.data$fixed==0]),length(my.data$item[my.data$fixed==0]))
    > sums
    [1] 17 17 17 17 17
    > isums <- sums - (my.data$item[my.data$fixed==0])
    > isums
    [1] 16 16 14 12 10
    
    0 讨论(0)
  • 2020-12-24 10:03

    Strangely enough, learning to vectorize in R is what helped me get used to basic functional programming. A basic technique would be to define your operations inside the loop as a function:

    data = ...;
    items = ...;
    
    leave_one_out = function(i) {
       data1 = data[items != i];
       delta = ...;  # some operation on data1
       return delta;
    }
    
    
    for (j in items) {
       delta.list = cbind(delta.list, leave_one_out(j));
    }
    

    To vectorize, all you do is replace the for loop with the sapply mapping function:

    delta.list = sapply(items, leave_one_out);
    
    0 讨论(0)
  • 2020-12-24 10:20

    This is no answer, but I wonder if any insight lies in this direction:

    > tapply((my.data$item[my.data$fixed==0])[-1], my.data$year[my.data$fixed==0][-1], sum)
    

    tapply produces a table of statistics (sums, in this case; the third argument) grouped by the parameter given as the second argument. For example

    2001 2003 2005 2007
    1    3    5    7
    

    The [-1] notation drops observation (row) one from the selected rows. So, you could loop and use [-i] on each loop

    for (i in 1:length(my.data$item)) {
      tapply((my.data$item[my.data$fixed==0])[-i], my.data$year[my.data$fixed==0][-i], sum)
    }
    

    keeping in mind that if you have any years with only 1 observation, then the tables returned by the successive tapply calls won't have the same number of columns. (i.e., if you drop out the only observation for 2001, then 2003, 2005, and 2007 would be te only columns returned).

    0 讨论(0)
提交回复
热议问题