问题
I am adding a new column to a dataframe using apply() and mutate. It works. Unfortunately, it is very slow. I have 24M rows and I am adding column based on values in a long (58 items). It was bearable with smaller list. Not anymore. Here is my example
large_df <-data.frame(A=(1:4),
B= c('a','b','c','d'),
C= c('e','f','g','h'))
long_list = c('e','f','g')
large_df =mutate (large_df, new_C = apply(large_df[,2:3], 1,
function(r) any(r %in% long_list)))
The new column (new_C) will read True or False. It works but I am looking for a speedy alternative.
Thank you so much. Serhiy
Bonus Q. I couldn't just select one column with in apply(), needed range. Why?
回答1:
Try one of these alternatives using lapply
:
large_df$new_c <- Reduce(`|`, lapply(large_df[, 2:3], `%in%`, long_list))
or sapply
:
large_df$new_c <- rowSums(sapply(large_df[, 2:3], `%in%`, long_list)) > 0
Both of which return :
large_df
# A B C new_c
#1 1 a e TRUE
#2 2 b f TRUE
#3 3 c g TRUE
#4 4 d h FALSE
来源:https://stackoverflow.com/questions/62502950/add-column-to-data-frame-based-on-long-list-and-values-in-another-column-is-too