问题
My clinical data structure looks like this:
patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
group <- c(1,1,NA,2,NA,NA,1,1,1,2,2,NA)
Data<- data.frame(patientid=patientid,group=group)
If there is missing data then the NA should become the same value as the other group value for the same patient id. In other words a patient is always in the same group and the missing data needs to be filled in to reflect that. So it should look like this:
patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
group <- c(1,1,1,2,2,2,1,1,1,2,2,2)
Data<- data.frame(patientid=patientid,group=group)
回答1:
You can write a little helper function like:
fun <- function(x) replace(x, is.na(x), x[!is.na(x)][1])
Then, you can use it in transform
or within
in base R:
transform(Data, group = ave(group, patientid, FUN = fun))
# patientid group
# 1 100 1
# 2 100 1
# 3 100 1
# 4 101 2
# 5 101 2
# 6 101 2
# 7 102 1
# 8 102 1
# 9 102 1
# 10 104 2
# 11 104 2
# 12 104 2
Or even with other packages:
library(data.table)
as.data.table(Data)[, group := fun(group), patientid][]
This will work even if the group value is not the first value for each patient "patientid". Try, for example:
# First row of "group" is `NA`
Data <- Data[c(3, 1, 2, 4:nrow(Data)), ]
回答2:
We can use fill
from tidyr
after grouping by 'patientid'
library(dplyr)
library(tidyr)
Data %>%
group_by(patientid) %>%
fill(group) %>%
ungroup
-output
# A tibble: 12 x 2
# patientid group
# <dbl> <dbl>
# 1 100 1
# 2 100 1
# 3 100 1
# 4 101 2
# 5 101 2
# 6 101 2
# 7 102 1
# 8 102 1
# 9 102 1
#10 104 2
#11 104 2
#12 104 2
回答3:
A base R option with ave
can make it
transform(
Data,
group = ave(group, patientid, FUN = function(x) unique(na.omit(x)))
)
which gives
patientid group
1 100 1
2 100 1
3 100 1
4 101 2
5 101 2
6 101 2
7 102 1
8 102 1
9 102 1
10 104 2
11 104 2
12 104 2
A data.table
option with nafill
setDT(Data)[, group := nafill(group, fill = unique(na.omit(group))), patientid]
which gives
> Data
patientid group
1: 100 1
2: 100 1
3: 100 1
4: 101 2
5: 101 2
6: 101 2
7: 102 1
8: 102 1
9: 102 1
10: 104 2
11: 104 2
12: 104 2
回答4:
dplyr:
Data%>%group_by(patientid)%>%mutate(group=(ifelse(is.na(group),max(group,na.rm=TRUE),group)))
回答5:
You can create a mapping between group and patientid, then use this mapping to fill in missing values.
# Create mapping btw group and patientid
df = data.frame(patientid, group)
patientid.unique = unique(patientid)
mapping = data.frame(pid=patientid.unique, group=NA)
for (pid in patientid.unique){
mapping$group[mapping$pid == pid] = unique(df[df$patientid == pid & !is.na(df$group), "group"])
}
> mapping
pid group
1 100 1
2 101 2
3 102 1
4 104 2
# Fill in missing values
group.filled = apply(df, 1, function(x) {mapping$group[mapping$pid == x[1]] })
df$group = group.filled
来源:https://stackoverflow.com/questions/65297927/fill-in-missing-data-for-group-by-unique-id