Im trying to categorising my data into different group based on type of data. My data and code is as follow:
bank ROE
bank1 0.73
bank2 0.94
bank3 0.62
b
You should use the %in%
-operator instead of the identity--you are comparing against a vector here.
Like so:
test$type <- ifelse(test$bank %in% sob, 1, ifelse(test$bank %in% fob, 2, ifelse(test$bank %in% jov, 3, 4)))
> test
bank ROE type
1 bank1 0.73 1
2 bank2 0.94 1
3 bank3 0.62 1
4 bank4 0.57 2
5 bank5 0.31 2
6 bank6 0.53 2
7 bank7 0.39 3
8 bank8 0.01 3
9 bank9 0.16 3
10 bank10 0.51 3
11 bank11 0.84 3
12 bank12 0.18 4
Alternatively, to avoid the cumbersome if-else structures you could do the classification resetting levels of a factor.
first copy the bank variable test$type<-test$bank
then, re-set the levels, using the vectors defined above (sob, fob, job). Notice the last step, 'other'
is set to the remaining value because bank12 is not defined in the other vectors.
levels(test$type) <- list('sob' = sob,
'fob' = fob,
'jov' = jov,
'other' = 'bank12')
Resulting in
> test
bank ROE type
1 bank1 0.73 sob
2 bank2 0.94 sob
3 bank3 0.62 sob
4 bank4 0.57 fob
5 bank5 0.31 fob
6 bank6 0.53 fob
7 bank7 0.39 jov
8 bank8 0.01 jov
9 bank9 0.16 jov
10 bank10 0.51 jov
11 bank11 0.84 jov
12 bank12 0.18 other
The ==
operator in your code compares the vector test$bank
with the vectors jov
. As these vectors are of different lengths (12 and 5) and the longer vector is not a multiple of the shorter one such as in the case of sob
(of length 3), you get a warning message.
To evaluate if a value is equal to any of the values in a vector you can use the %in%
operator just as @ako suggest. However when working with groups factor
and levels
are useful functions. Specify the variable as a factor, then set new levels.
test <- data.frame(
bank = c('bank1','bank2','bank3','bank4','bank5','bank6','bank7','bank8','bank9','bank10','bank11','bank12'),
ROE = c(0.73,0.94,0.62,0.57,0.31,0.53,0.39,0.01,0.16,0.51,0.84,0.18)
)
test$bank <- factor(test$bank)
levels(test$bank) <- list(
'1' = c('bank1', 'bank2','bank3'),
'2' = c('bank4','bank5', 'bank6'),
'3' = c('bank7', 'bank8','bank9', 'bank10','bank11'),
'other' = NA
)
test$bank[is.na(test$bank)] <- 'other'
You could also try:
lst1 <- list(sob, fob, jov)
test$type <- setNames(rep(seq_along(lst1),sapply(lst1,length)),unlist(lst1))[test$bank]
test$type[is.na(test$type) ] <- 4
test$type
#[1] 1 1 1 2 2 2 3 3 3 3 3 4