refactor data.frame column values

后端 未结 3 531
小蘑菇
小蘑菇 2021-01-16 20:53

Sorry guys if this is a noob question. I need help on how to loop over my dataframe.Here is a sample data.

a <- c(10:29);
b <- c(40:59);
e <- rep(1,         


        
相关标签:
3条回答
  • 2021-01-16 21:11

    I would use cut() for this:

    test$e = cut(test$a, 
                 breaks = c(0, 15, 20, 25, 30), 
                 labels = c(1, 2, 3, 4))
    

    If you want to "generalize" the cut--in other words, where you don't know exactly how many sets of 5 (levels) you need to make--you can take a two-step approach using c() and seq():

    test$e = cut(test$a, 
                 breaks = c(0, seq(from = 15, to = max(test$a)+5, by = 5)))
    levels(test$e) = 1:length(levels(test$e))
    

    Since Backlin beat me to the cut() solution, here's another option (which I don't prefer in this case, but am posting just to demonstrate the many options available in R).

    Use recode() from the car package.

    require(car)    
    test$e = recode(test$a, "0:15 = 1; 15:20 = 2; 20:25 = 3; 25:30 = 4")
    
    0 讨论(0)
  • 2021-01-16 21:17
    data.frame(a, b, e=(1:4)[cut(a, c(-Inf, 15, 20, 25, 30))])
    

    Update:

    Greg's comment provides a more direct solution without the need to go via subsetting an integer vector with a factor returned from cut.

    data.frame(a, b, e=findInterval(a, c(-Inf, 15, 20, 25, 30)))
    
    0 讨论(0)
  • 2021-01-16 21:20

    You don't need a loop. You have nearly all you need:

    test[test$a > 15 & test$a < 20, "e"] <- 2
    
    0 讨论(0)
提交回复
热议问题