I have a quite big data frame in R with two columns. I am trying to make out of the Code
column (factor
type with 858 levels) the dummy variables. The problem is that the R Studio always crashed when I am trying to do that.
> str(d)
'data.frame': 649226 obs. of 2 variables:
$ User: int 210 210 210 210 269 317 317 317 317 326 ...
$ Code : Factor w/ 858 levels "AA02","AA03",..: 164 494 538 626 464 496 435 464 475 163 ...
The User
column is not unique, meaning that there can be several rows with the same User
. Doesn't matter if in the end the amount of rows remains the same or the rows with the same User
are merged into one row having several columns non-empty with the count of Code
s.
I found couple of solutions that work for a smaller dataset, but not for mine.
Tried using
model.matrix
, but the R Studio just crashesm <- model.matrix( ~ Code, data = d)
Tried
for
cycle withifelse
, but the code run for 4 hours and then I noticed that the R Studio crashed.for (t in unique(d$Code)) { d[paste("Code", t, sep = "")] <- ifelse(d$Code == t, 1, 0) }
Found here Create new dummy variable columns from categorical variable
Would be great if you can recommend me some method which is fast and working for such type of data.
Thanks!
This worked for me perfectly:
library(reshape2)
m <- acast(data = d, User ~ Code)
The only thing was that it produced NA
s, instead of 0
s, but this can be easily changed with this:
m[is.na(m)] <- 0
来源:https://stackoverflow.com/questions/22286466/r-expanding-an-r-factor-into-dummy-columns-for-every-factor-level