I've tried pretty much everything from this similar question, but I can't get the results everyone else seems to be getting. This is my problem:
I have a data frame like this, listing the grades each teacher works with:
> profs <- data.frame(teaches = c("1st", "1st, 2nd",
"2nd, 3rd",
"1st, 2nd, 3rd"))
> profs
teaches
1 1st
2 1st, 2nd
3 2nd, 3rd
4 1st, 2nd, 3rd
I've been looking for solutions to break the teaches
variable into columns, like so:
teaches1st teaches2nd teaches3rd
1 1 0 0
2 1 1 0
3 0 1 1
4 1 1 1
I understand this solution involving the splitstackshape
library and the apparently deprecated concat.split.expanded
function is supposed to do exactly what I want, given the answerer's explanation. However, I can't seem to reach the same results:
> concat.split.expanded(profs, "teaches", fill = 0, drop = TRUE)
Fehler in seq.default(min(vec), max(vec)) :
'from' cannot be NA, NaN or infinite
Using cSplit
, which I understood supersedes "most of the earlier concat.split* functions", I get this:
> cSplit(profs, "teaches")
teaches_1 teaches_2 teaches_3
1: 1st NA NA
2: 1st 2nd NA
3: 2nd 3rd NA
4: 1st 2nd 3rd
I've tried using cSplit
's help and tweaking every one of those parameters, but I just can't get that split. I appreciate any help.
Since your concatenated data are concatenated character strings (not concatenated numerical values) you'll need to add type = "character"
to get the function to work as you expect it.
The function's default setting is for numeric values, hence the error about NaN
and so on.
The naming has been made more consistent with the short forms of the other functions in the same family. Thus, it is now cSplit_e
(though the old function name would still work).
library(splitstackshape)
cSplit_e(profs, "teaches", ",", type = "character", fill = 0)
# teaches teaches_1st teaches_2nd teaches_3rd
# 1 1st 1 0 0
# 2 1st, 2nd 1 1 0
# 3 2nd, 3rd 0 1 1
# 4 1st, 2nd, 3rd 1 1 1
The help page for ?concat.split.expanded
is the same as that of cSplit_e
. If you have any tips on making it clearer to understand, please raise an issue at the package's GitHub page.
This is another option:
Vectorize(grepl, 'pattern')(c('1st', '2nd', '3rd'), profs$teaches)
# 1st 2nd 3rd
# [1,] TRUE FALSE FALSE
# [2,] TRUE TRUE FALSE
# [3,] FALSE TRUE TRUE
# [4,] TRUE TRUE TRUE
You could try mtabulate
from qdapTools
library(qdapTools)
res <- mtabulate(strsplit(as.character(profs$teaches), ', '))
colnames(res) <- paste0('teaches', colnames(res))
res
# teaches1st teaches2nd teaches3rd
#1 1 0 0
#2 1 1 0
#3 0 1 1
#4 1 1 1
Or using stringi
library(stringi)
(vapply(c('1st', '2nd', '3rd'), stri_detect_fixed, logical(4L),
str=profs$teaches))+0L
# 1st 2nd 3rd
#[1,] 1 0 0
#[2,] 1 1 0
#[3,] 0 1 1
#[4,] 1 1 1
I've found a workaround. It seems that concat.split.expanded
works if you have a string variable containing nothing but separators and numbers, i.e.:
> profs <- data.frame(teaches = c("1", "1, 2", "2, 3", "1, 2, 3"))
> profs
teaches
1 1
2 1, 2
3 2, 3
4 1, 2, 3
Now concat.split.expanded
works as on Dummy variables from a string variable:
> concat.split.expanded(profs, "teaches", fill = 0, drop = TRUE)
teaches_1 teaches_2 teaches_3
1 1 0 0
2 1 1 0
3 0 1 1
4 1 1 1
However, I'm still looking for a solution which doesn't involve removing all letters from my teaches
variable.
来源:https://stackoverflow.com/questions/29101708/create-several-dummy-variables-from-one-string-variable