Imputing missing values linearly in R

前端 未结 2 641
名媛妹妹
名媛妹妹 2021-01-12 18:54

I have a data frame with missing values:

X   Y   Z
54  57  57
100 58  58
NA  NA  NA
NA  NA  NA
NA  NA  NA
60  62  56
NA  NA  NA
NA  NA  NA
69  62  62
         


        
相关标签:
2条回答
  • 2021-01-12 19:22

    I can recommend the imputeTS package, which I am maintaining (even if it's for time series imputation)

    For this case it would work like this:

    library(imputeTS)
    df$X <- na.interpolation(df$X, option ="linear")
    df$Y <- na.interpolation(df$Y, option ="linear")
    df$Z <- na.interpolation(df$Z, option ="linear")
    

    As mentioned the package requires time series / vector input. (that's why each column has to be called seperatly)

    The package offers also a lot of other imputation functions like e.g. spline interpolation.

    0 讨论(0)
  • 2021-01-12 19:32

    Base R's approxfun() returns a function that will linearly interpolate the data it is handed.

    ## Make easily reproducible data
    df <- read.table(text="X   Y   Z
    54  57  57
    100 58  58
    NA  NA  NA
    NA  NA  NA
    NA  NA  NA
    60  62  56
    NA  NA  NA
    NA  NA  NA
    69  62  62", header=T)
    
    ## See how this works on a single vector
    approxfun(1:9, df$X)(1:9)
    # [1]  54 100  90  80  70  60  63  66  69
    
    ## Apply interpolation to each of the data.frame's columns
    data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
    #     X  Y    Z
    # 1  54 57 57.0
    # 2 100 58 58.0
    # 3  90 59 57.5
    # 4  80 60 57.0
    # 5  70 61 56.5
    # 6  60 62 56.0
    # 7  63 62 58.0
    # 8  66 62 60.0
    # 9  69 62 62.0
    
    0 讨论(0)
提交回复
热议问题