问题
I have a data set with body temperatures taken every minute for 8 hours. I removed aberrant data and now have NA values, sometimes just one alone, and sometimes more then 10 in a row. I would like to replace the missing data using linear interpolation.
I tried different things but I could'nt make 'approx' to work (NA values stayed NA...) or even find a way to specify to R to use the value before (same column, minus 1 row) or the value after (same column, + 1 row). in this examples, where I try to replace just one NA, the [+1] and [-1] are just read as [1], so it doesn't work
df$var1_lini <- ifelse (!is.na(df$var1),df$var1,
ifelse (!is.na(df$var[+1]),df$var[-1]+(df$var1[-1]+df$var1[+1])/2,NA))
i'm open to any form of solution I am a beginner so a detailed answer would be great! Thank you
Eve
回答1:
Another approach is to build a linear model using the existing data you have and then use that model (model predictions) to replace NAs.
A simple example to help you understand is this:
library(ggplot2)
# create example dataset
df = data.frame(value = mtcars$qsec,
time = 1:nrow(mtcars))
# replace some values with NA (you can experiment with different values)
df$value[c(5,12,17,18,30)] = NA
# build linear model based on existing data (model ignores rows with NAs)
m = lm(value ~ time, data = df)
# add predictions as a column
df$pred_value = predict(m, newdata = df)
# replace (only) NAs with predictions
df$interp_value = ifelse(is.na(df$value), df$pred_value, df$value)
# plot existing and interpolated data
ggplot()+
geom_point(data=df, aes(time, value), size=5)+
geom_point(data=df, aes(time, interp_value), col="red")
Where the black points represent the existing values and the red points represent existing + NA replacements.
回答2:
The easiest way solve this is to use a package that has functions for missing data replacement like imputeTS
or forecast
, zoo
The process of replacing missing values with reasonable estimations is also called 'imputation' in statistics.
For interpolating a time series, vector or data.frame it is as easy as this:
library("imputeTS")
na.interpolation(yourDataWithNAs)
Keep in mind, there are also other imputation methods beyond linear interpolation. E.g. Moving Average Imputation, Seasonality based imputation - depending on the problem another method will provide better results. (here are some further explanations: Time Series Imputation)
来源:https://stackoverflow.com/questions/48563436/linear-interpolation-in-time-series-in-r