Imputation in R

前端 未结 3 1425
别跟我提以往
别跟我提以往 2021-02-01 10:58

I am new in R programming language. I just wanted to know is there any way to impute null values of just one column in our dataset. Because all of imputation co

相关标签:
3条回答
  • 2021-02-01 11:44

    Here is an example using the Hmisc package and impute

    library(Hmisc)
    DF <- data.frame(age = c(10, 20, NA, 40), sex = c('male','female'))
    
    # impute with mean value
    
    DF$imputed_age <- with(DF, impute(age, mean))
    
    # impute with random value
    DF$imputed_age2 <- with(DF, impute(age, 'random'))
    
    # impute with the media
    with(DF, impute(age, median))
    # impute with the minimum
    with(DF, impute(age, min))
    
    # impute with the maximum
    with(DF, impute(age, max))
    
    
    # and if you are sufficiently foolish
    # impute with number 7 
    with(DF, impute(age, 7))
    
     # impute with letter 'a'
    with(DF, impute(age, 'a'))
    

    Look at ?impute for details on how the imputation is implemented

    0 讨论(0)
  • 2021-02-01 11:44

    Why not use more sophisticated imputation algorithms, such as mice (Multiple Imputation by Chained Equations)? Below is a code snippet in R you can adapt to your case.

    library(mice)
    
    #get the nhanes dataset
    dat <- mice::nhanes
    
    #impute it with mice
    imp <- mice(mice::nhanes, m = 3, print=F)
    
    imputed_dataset_1<-complete(imp,1)
    
    head(imputed_dataset_1)
    
    #     age  bmi hyp chl
    # 1   1   22.5   1 118
    # 2   2   22.7   1 187
    # 3   1   30.1   1 187
    # 4   3   24.9   1 186
    # 5   1   20.4   1 113
    # 6   3   20.4   1 184
    
    #Now, let's see what methods have been used to impute each column
    meth<-imp$method
    #  age   bmi   hyp   chl
    #"" "pmm" "pmm" "pmm"
    
    #The age column is complete, so, it won't be imputed
    # Columns bmi, hyp and chl are going to be imputed with pmm (predictive mean matching)
    
    #Let's say that we want to impute only the "hyp" column
    #So, we set the methods for the bmi and chl column to ""
    meth[c(2,4)]<-""
    #age   bmi   hyp   chl 
    #""    "" "pmm"    "" 
    
    #Let's run the mice imputation again, this time setting the methods parameter to our modified method
    imp <- mice(mice::nhanes, m = 3, print=F, method = meth)
    
    partly_imputed_dataset_1 <- complete(imp, 3)
    
    head(partly_imputed_dataset_1)
    
    #    age  bmi hyp chl
    # 1   1   NA   1  NA
    # 2   2 22.7   1 187
    # 3   1   NA   1 187
    # 4   3   NA   2  NA
    # 5   1 20.4   1 113
    # 6   3   NA   2 184
    
    0 讨论(0)
  • 2021-02-01 11:57

    There are plenty of packages that can do this for you. (a little more information about the data could help suggesting you the best options)

    One example can be using the VIM package.

    It has a function called kNN (k-nearest-neighbor imputation) This function has a option variable where you can specify which variables shall be imputed.

    Here is an example:

    library("VIM")
    kNN(sleep, variable = c("NonD","Gest"))
    

    The sleep dataset I used in this example comes along with VIM.

    If there is some time dependency in your columns you want to impute using time series imputation packages could also make sense. In this case you could use for example the imputeTS package. Here is an example:

      library(imputeTS)
      na.kalman(tsAirgap)
    

    The tsAirgap dataset used here as an example comes also along with imputeTS.

    0 讨论(0)
提交回复
热议问题