Add column to data frame based on long list and values in another column is too slow

后端 未结 1 1564
野的像风
野的像风 2021-01-27 14:05

I am adding a new column to a dataframe using apply() and mutate. It works. Unfortunately, it is very slow. I have 24M rows and I am adding column based on values in a long (58

相关标签:
1条回答
  • 2021-01-27 14:52

    Try one of these alternatives using lapply :

    large_df$new_c <- Reduce(`|`, lapply(large_df[, 2:3], `%in%`, long_list))
    

    or sapply :

    large_df$new_c <- rowSums(sapply(large_df[, 2:3], `%in%`, long_list)) > 0
    

    Both of which return :

    large_df
    #  A B C new_c
    #1 1 a e  TRUE
    #2 2 b f  TRUE
    #3 3 c g  TRUE
    #4 4 d h FALSE
    
    0 讨论(0)
提交回复
热议问题