Create a matrix of dummy variables from my data frame; use `NA` for missing values

旧城冷巷雨未停 提交于 2019-12-24 01:05:12

问题


I have a data based on different years, repeated several time. I want my output having columns equal to number of years, each column for one year. Now, the purpose is to create dummy for each year separately. For example, the output column for year 2000 must have a value "1" whenever there is a non-NA observation in the main data parallel to year 2000, else "0". Moreover, NA must remain NA. Please see below a small sample of input data:

df:
2000    NA
2001    NA
2002   -1.3
2000    1.1
2001    0
2002    NA
2000   -3
2001    3
2002    4.1

Now the output should be:

df1:
2000    2001    2002
 NA      NA      NA
 NA      NA      NA
 0       0       1
 1       0       0
 0       1       0
 NA      NA      NA
 1       0       0
 0       1       0
 0       0       1

I would prefer to obtain this output by using a "for loop", if possible. Otherwise, any simpler approach will be appreciated.


回答1:


No loop is needed. We can use model.matrix:

## your data variable and NA index
x <- c(NA, NA, -1.3, 1.1, 0, NA, -3, 3, 4.1)
na_id <- is.na(x)

## code your year variable as a factor
year <- factor(rep(2000:2002, 3))

## original model matrix; drop intercept to disable contrast
X <- model.matrix(~ year - 1)

#  year2000 year2001 year2002
#1        1        0        0
#2        0        1        0
#3        0        0        1
#4        1        0        0
#5        0        1        0
#6        0        0        1
#7        1        0        0
#8        0        1        0
#9        0        0        1

## put NA where `x` is NA (we have used recycling rule here)
X[na_id] <- NA

#  year2000 year2001 year2002
#1       NA       NA       NA
#2       NA       NA       NA
#3        0        0        1
#4        1        0        0
#5        0        1        0
#6       NA       NA       NA
#7        1        0        0
#8        0        1        0
#9        0        0        1

Matrix X will have some attributes. You can drop them if you want:

attr(X, "assign") <- attr(X, "contrasts") <- NULL

You can also rename the column names of this matrix to something else, like

colnames(X) <- 2000:2002


来源:https://stackoverflow.com/questions/39802092/create-a-matrix-of-dummy-variables-from-my-data-frame-use-na-for-missing-valu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!