categorizing data in R

后端 未结 3 973
逝去的感伤
逝去的感伤 2021-01-24 14:32

Im trying to categorising my data into different group based on type of data. My data and code is as follow:

bank    ROE
bank1   0.73
bank2   0.94
bank3   0.62
b         


        
相关标签:
3条回答
  • 2021-01-24 15:02

    You should use the %in%-operator instead of the identity--you are comparing against a vector here.

    Like so:

    test$type <- ifelse(test$bank %in% sob, 1, ifelse(test$bank %in% fob, 2, ifelse(test$bank %in% jov, 3,     4)))
    
    > test
         bank  ROE type
    1   bank1 0.73    1
    2   bank2 0.94    1
    3   bank3 0.62    1
    4   bank4 0.57    2
    5   bank5 0.31    2
    6   bank6 0.53    2
    7   bank7 0.39    3
    8   bank8 0.01    3
    9   bank9 0.16    3
    10 bank10 0.51    3
    11 bank11 0.84    3
    12 bank12 0.18    4
    

    Alternatively, to avoid the cumbersome if-else structures you could do the classification resetting levels of a factor.

    first copy the bank variable test$type<-test$bank

    then, re-set the levels, using the vectors defined above (sob, fob, job). Notice the last step, 'other' is set to the remaining value because bank12 is not defined in the other vectors.

    levels(test$type) <- list('sob' = sob,
                              'fob' = fob,
                              'jov' = jov,
                              'other' = 'bank12')
    

    Resulting in

    > test
         bank  ROE  type
    1   bank1 0.73   sob
    2   bank2 0.94   sob
    3   bank3 0.62   sob
    4   bank4 0.57   fob
    5   bank5 0.31   fob
    6   bank6 0.53   fob
    7   bank7 0.39   jov
    8   bank8 0.01   jov
    9   bank9 0.16   jov
    10 bank10 0.51   jov
    11 bank11 0.84   jov
    12 bank12 0.18 other
    
    0 讨论(0)
  • 2021-01-24 15:09

    The == operator in your code compares the vector test$bank with the vectors jov. As these vectors are of different lengths (12 and 5) and the longer vector is not a multiple of the shorter one such as in the case of sob (of length 3), you get a warning message.

    To evaluate if a value is equal to any of the values in a vector you can use the %in% operator just as @ako suggest. However when working with groups factor and levels are useful functions. Specify the variable as a factor, then set new levels.

    test <- data.frame(
      bank = c('bank1','bank2','bank3','bank4','bank5','bank6','bank7','bank8','bank9','bank10','bank11','bank12'),
      ROE = c(0.73,0.94,0.62,0.57,0.31,0.53,0.39,0.01,0.16,0.51,0.84,0.18)
    )
    
    test$bank <- factor(test$bank)
    
    levels(test$bank) <- list(
      '1' = c('bank1', 'bank2','bank3'),
      '2' = c('bank4','bank5', 'bank6'),
      '3' = c('bank7', 'bank8','bank9', 'bank10','bank11'),
      'other' = NA
    )
    
    test$bank[is.na(test$bank)] <- 'other'
    
    0 讨论(0)
  • 2021-01-24 15:16

    You could also try:

    lst1 <- list(sob, fob, jov)
    test$type <- setNames(rep(seq_along(lst1),sapply(lst1,length)),unlist(lst1))[test$bank]
    test$type[is.na(test$type) ] <- 4
    
    test$type
    #[1] 1 1 1 2 2 2 3 3 3 3 3 4
    
    0 讨论(0)
提交回复
热议问题