问题
How do I subset missing values in one dataframe with values from another?
Let's say I have two datasets:
dataset 1 shows the amount of food that is produced by a country each day.
country day tonnes of food
## 1 china 1 6
## 2 china 1 NA
## 3 china 2 2
## 4 china 2 NA
dataset2 is the average amount of food by day
country day average tonnes of food
## 1 china 1 6
## 3 china 2 2
How can I fill in the NAs of dataset1 with the averages from dataset2.
I.e. IF is.na(dataset1$tonnes)
is TRUE then fill in with average for day from dataset2$averagetonnes
回答1:
We can use join in data.table
library(data.table)
setDT(df1)[df2, on =c("country", "day")][is.na(tonnes.of.food),
tonnes.of.food:= average.tonnes.of.food][, average.tonnes.of.food:=NULL][]
# country day tonnes.of.food
#1: china 1 6
#2: china 1 6
#3: china 2 2
#4: china 2 2
回答2:
If I understand you correctly using the match
function will solve your problem.
Data:
df1 <- data.frame(country=c(rep('china1',2),rep('china2',2)),day=c(1,1,2,2),tof = c(6,NA,2,NA),stringsAsFactors = F)
df2 <- data.frame(country=c('china1','china2'),day=c(1,2),atof = c(6,2),stringsAsFactors = F)
df1
country day tof
#1 china1 1 6
#2 china1 1 NA
#3 china2 2 2
#4 china2 2 NA
This line will replace the NAs with the averages of the corresponding country of the second data.frame df2. The match
function results in a vector of positions of matches and [which(is.na(df1$tof))]
selects the indices where there is a NA
in the “tof” column.
df1$tof[is.na(df1$tof)] <- df2$atof[match(df1$country,df2$country)][which(is.na(df1$tof))]
df1
country day tof
#1 china1 1 6
#2 china1 1 6
#3 china2 2 2
#4 china2 2 2
来源:https://stackoverflow.com/questions/34697032/fill-in-missing-values-nas-with-values-from-another-dataframe-in-r