问题
I have a data based on different years, repeated several time. I want my output having columns equal to number of years, each column for one year. Now, the purpose is to create dummy for each year separately. For example, the output column for year 2000 must have a value "1" whenever there is a non-NA observation in the main data parallel to year 2000, else "0". Moreover, NA must remain NA. Please see below a small sample of input data:
df:
2000 NA
2001 NA
2002 -1.3
2000 1.1
2001 0
2002 NA
2000 -3
2001 3
2002 4.1
Now the output should be:
df1:
2000 2001 2002
NA NA NA
NA NA NA
0 0 1
1 0 0
0 1 0
NA NA NA
1 0 0
0 1 0
0 0 1
I would prefer to obtain this output by using a "for loop", if possible. Otherwise, any simpler approach will be appreciated.
回答1:
No loop is needed. We can use model.matrix
:
## your data variable and NA index
x <- c(NA, NA, -1.3, 1.1, 0, NA, -3, 3, 4.1)
na_id <- is.na(x)
## code your year variable as a factor
year <- factor(rep(2000:2002, 3))
## original model matrix; drop intercept to disable contrast
X <- model.matrix(~ year - 1)
# year2000 year2001 year2002
#1 1 0 0
#2 0 1 0
#3 0 0 1
#4 1 0 0
#5 0 1 0
#6 0 0 1
#7 1 0 0
#8 0 1 0
#9 0 0 1
## put NA where `x` is NA (we have used recycling rule here)
X[na_id] <- NA
# year2000 year2001 year2002
#1 NA NA NA
#2 NA NA NA
#3 0 0 1
#4 1 0 0
#5 0 1 0
#6 NA NA NA
#7 1 0 0
#8 0 1 0
#9 0 0 1
Matrix X
will have some attributes. You can drop them if you want:
attr(X, "assign") <- attr(X, "contrasts") <- NULL
You can also rename the column names of this matrix to something else, like
colnames(X) <- 2000:2002
来源:https://stackoverflow.com/questions/39802092/create-a-matrix-of-dummy-variables-from-my-data-frame-use-na-for-missing-valu