Create several dummy variables from one string variable

自作多情 提交于 2019-11-29 15:51:06

Since your concatenated data are concatenated character strings (not concatenated numerical values) you'll need to add type = "character" to get the function to work as you expect it.

The function's default setting is for numeric values, hence the error about NaN and so on.

The naming has been made more consistent with the short forms of the other functions in the same family. Thus, it is now cSplit_e (though the old function name would still work).

library(splitstackshape)
cSplit_e(profs, "teaches", ",", type = "character", fill = 0)
#         teaches teaches_1st teaches_2nd teaches_3rd
# 1           1st           1           0           0
# 2      1st, 2nd           1           1           0
# 3      2nd, 3rd           0           1           1
# 4 1st, 2nd, 3rd           1           1           1

The help page for ?concat.split.expanded is the same as that of cSplit_e. If you have any tips on making it clearer to understand, please raise an issue at the package's GitHub page.

This is another option:

Vectorize(grepl, 'pattern')(c('1st', '2nd', '3rd'), profs$teaches)
#        1st   2nd   3rd
# [1,]  TRUE FALSE FALSE
# [2,]  TRUE  TRUE FALSE
# [3,] FALSE  TRUE  TRUE
# [4,]  TRUE  TRUE  TRUE

You could try mtabulate from qdapTools

library(qdapTools)
res <- mtabulate(strsplit(as.character(profs$teaches), ', '))
colnames(res) <- paste0('teaches', colnames(res))
res
#    teaches1st teaches2nd teaches3rd
#1          1          0          0
#2          1          1          0
#3          0          1          1
#4          1          1          1

Or using stringi

library(stringi)
(vapply(c('1st', '2nd', '3rd'), stri_detect_fixed, logical(4L), 
                          str=profs$teaches))+0L
#     1st 2nd 3rd
#[1,]   1   0   0
#[2,]   1   1   0
#[3,]   0   1   1
#[4,]   1   1   1
Waldir Leoncio

I've found a workaround. It seems that concat.split.expanded works if you have a string variable containing nothing but separators and numbers, i.e.:

> profs <- data.frame(teaches = c("1", "1, 2", "2, 3", "1, 2, 3"))
> profs
  teaches
1       1
2    1, 2
3    2, 3
4 1, 2, 3

Now concat.split.expanded works as on Dummy variables from a string variable:

> concat.split.expanded(profs, "teaches", fill = 0, drop = TRUE)
  teaches_1 teaches_2 teaches_3
1         1         0         0
2         1         1         0
3         0         1         1
4         1         1         1

However, I'm still looking for a solution which doesn't involve removing all letters from my teaches variable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!