R- create new dataframe variable from subset of two variables with missing data NA

问题

I have a simple example data frame with two data columns (data1 and data2) and two grouping variables (Measure 1 and 2). Measure 1 and 2 have missing data NA.

d <- data.frame(Measure1 = 1:2, Measure2 = 3:4, data1 = 1:10, data2 = 11:20) 
d$Measure1[4]=NA 
d$Measure2[8]=NA 
d

   Measure1 Measure2 data1 data2
1         1        3     1    11
2         2        4     2    12
3         1        3     3    13
4        NA        4     4    14
5         1        3     5    15
6         2        4     6    16
7         1        3     7    17
8         2       NA     8    18
9         1        3     9    19
10        2        4    10    20

I want to create a new variable (d$new) that contains data1, but only for rows where Measure1 equals 1. I tried this and get the following error:

d$new[d$Measure1 == 1] = d$data1[d$Measure1 == 1]

Error in d$new[d$Measure1 == 1] = d$data1[d$Measure1 == 1] : NAs are not allowed in subscripted assignments

Next I would like to add to d$new the data from data2 only for rows where Measure2 equals 4. However, the missing data in Measure1 and Measure2 is causing problems in subsetting the data and assigning it to a new variable. I can think of some overly complicated solutions, but I'm sure there's an easy way I'm not thinking of. Thanks for the help!

回答1:

Find rows where Measure1 is not NA and is the value you want.

measure1_notNA = which(!is.na(d$Measure1) & d$Measure1 == 1)

Initialize your new column with some default value.

d$new = NA

Replace only those rows with corresponding values from data1 column.

d$new[measure1_notNA] = d$data1[measure1_notNA]

Or, in 1 line:

d$new[d$Measure1 == 1 & !is.na(d$Measure1)] = d$data1[d$Measure1 == 1 & !is.na(d$Measure1)]

回答2:

Based on the description, it seems that the OP want to create a column 'new' based on two columns i.e. when Measure1==1, get the corresponding elements of 'data1', similarly for Measure2==4, get the corresponding 'data2' values, and the rest with NA. We can use ifelse

 d$new <- with(d, ifelse(Measure1==1 & !is.na(Measure1), data1,
                             ifelse(Measure2==4, data2, NA)))

We could also do this with data.table by assigning (:=) in two steps. Convert the 'data.frame' to 'data.table' (setDT(d)). Based on the logical condition (Measure1==1 & !is.na(Measure1)), we assign the column 'new' as 'data1'. This will create the column with values from 'data1' for that are TRUE for the logical condition and get NA for the rest. In the second step, we do the same using 'Measure2/data2'.

 library(data.table) 
 setDT(d)[Measure1==1 & !is.na(Measure1), new:= data1]
 d[Measure2==4, new:= data2]

来源：https://stackoverflow.com/questions/32128783/r-create-new-dataframe-variable-from-subset-of-two-variables-with-missing-data

标签

merge

dataframe

subset

missing-data