Using do loops in R to create new variables

前端 未结 6 1798
别那么骄傲
别那么骄傲 2020-12-20 00:56

I\'m a long time SAS programmer looking to make the jump to R. I know R isn\'t all that great for variable re-coding but is there a way to do this with do loops.

If

相关标签:
6条回答
  • 2020-12-20 01:46

    This is really late, but you can actually do this without loops or *apply. I'm assuming that the variables are columns in a data frame (which makes sense if the OP is familiar with SAS datasets and macros).

    df[paste("c", 1:100, sep="_")] <- df[paste("a", 1:100, sep="_")] +
                                      df[paste("b", 1:100, sep="_")]
    
    0 讨论(0)
  • 2020-12-20 01:52

    SAS uses a rudimentary macro language, which depends on text replacement rather than evaluation of expressions like any proper programming language. Your SAS files are essentially two things: SAS commands, and Macro expressions (things starting with '%'). Macro languages are highly problematic and hard to debug (for example, do expressions within expressions get expanded? Why do you have to do "&&x" or even "&&&x"? Why do you need two semicolons here?). It's clunky, and inelegant compared to a well-designed programming language that is based on a single syntax.

    If your a_i variables are single numbers, then you should have made them as a vector - e.g:

    > a = 1:100
    > b = runif(100)
    

    Now I can get elements easy:

    > a[1]
    

    and add up in parallel:

    > c = a + b
    

    You could do it with a loop, initialising c first:

    > c = rep(0,100)
    > for(i in 1:100){
       c[i]=a[i]+b[i]
       }
    

    But that would be sloooooow.

    Nearly every R beginner asks 'how do I create a variable a_i for some values of i', and then shortly afterwards they ask how to access variable a_i for some values of i. The answer is always to make a as either a vector or a list.

    0 讨论(0)
  • 2020-12-20 01:54

    I suspect that if you have one hundred variables a_1, a_2, ..., a_100, all of your variables are related. In fact, if you want to do

    c_1 = a_1 + b_1
    

    then a, b, c are related. Therefore, I recommend that you combine all of your variables into a single data frame, where one column is a and another is b.

    The question is how do you combine your variables in a sensible way. However, to give a useful answer, can you tell us how these variables are created?


    Perhaps this isn't suitable, for your case. If not, a bit more information would be useful.

    0 讨论(0)
  • 2020-12-20 01:56

    This is actually a pretty interesting question. From my reading and recent (forced) use of SAS, the question seems to be trying to recode variables in a SAS dataset within a data step using a bit of macro code. Otherwise if they were free variables being created they would start with a & character. I think the example code would actually be better represented like:

    %macro recodevars;
    data test;
      set test;
    
      %do i=1 %to 100;
      c_&i = a_&i + b_&i;
      %end;
    
    run;
    %mend recodevars;
    %recodevars;
    

    You could do something similar in R like this example:

    test <- data.frame(vara1=1:10,varb1=2:11,vara2=3:12,varb2=4:13)
    
    test[paste0("varc",1:2)] <- test[paste0("vara",1:2)] + test[paste0("varb",1:2)]
    

    I'd be curious to know what insight others have to answer the question if it is applied to a dataframe and not free variables.

    0 讨论(0)
  • 2020-12-20 02:00

    This stuff is trivial. To me, it looks like you want to find a way to create commands automatically and execute them. Easy peasy.

    For instance, this assigns to C_i the value in A_i:

    for(i in 1:100){
        tmpCmd = paste("C_",i,"= A_",i, sep = "")
        eval(parse(text = tmpCmd))
    }
    rm(i, tmpCmd)
    

    Just remember eval(parse(text = ...))) and paste(), and you're off to the races in creating loops of commands to execute.

    You can then add in the operation you'd like to do, i.e. the summation with B_i, by swapping in this line:

        tmpCmd = paste("C_",i,"= A_",i," + B_",i, sep = "")
    

    However, others are right that using good data structures is a way to avoid having to do a lot of tedious things like this. Yet, when you need to, such repetitive code isn't hard to devise.

    0 讨论(0)
  • 2020-12-20 02:00

    The R way would be to use lists.

    > a_1 = 1
    > a_2 = 2
    > a_3 = 3
    > a_4 = 4
    > a_5 = 5
    
    > b_1 = 1
    > b_2 = 2
    > b_3 = 3
    > b_4 = 4
    > b_5 = 5
    
    > a.list <- ls(patter='a_*')
    > a.list
    [1] "a_1" "a_2" "a_3" "a_4" "a_5"
    

    and define blist as well.

    if(length(a.list)==length(b.list)){
       c.list <- lapply(1:length(a.list), function(x) eval(parse(text=a.list[x])) + eval(parse(text=b.list[x])))
    
       c.list.names <- paste('c', 1:length(a.list), sep='_')
    
       lapply(1:length(c.list), function(x) assign(c.list.names[x], c.list[x], envir=.GlobalEnv)) 
    }
    

    I can't think of a way to do this without the eval(parse(yuk)) and assign unless you follow csgillespie's advice (which is the right way!)

    0 讨论(0)
提交回复
热议问题