Filling NA row values with nearest right side row value in R

前端 未结 2 1514
情话喂你
情话喂你 2020-12-21 18:54

I want to convert the given dataframe from

             c1     c2   c3   c4    c5
    VEG PUFF     12     78.43
CHICKEN PUFF &l         


        
相关标签:
2条回答
  • 2020-12-21 19:38

    Update

    As there was lot of confusion on the expected output, updating the answer as suggested by @DavidArenburg using a tidyverse solution

    library(dplyr)
    library(tidyr)
    df %>%
      add_rownames() %>%
      gather(variable, value, -rowname) %>%
      filter(!is.na(value)) %>%
      group_by(rowname) %>%
      mutate(indx = row_number()) %>%
      select(-variable) %>%
      spread(indx, value)
    
    #        rowname   `1`   `2`
    #*        <chr> <dbl> <dbl>
    #1 BAKERY_Total    28 84.04
    #2 CHICKEN_PUFF    16 88.24
    #3     VEG_PUFF    12 78.43
    

    Another solution could be

    library(data.table)
    temp <- apply(df, 1, function(x) data.frame(matrix(x[!is.na(x)], nrow = 1)))
    rbindlist(temp, fill = T)
    

    Previous Answer

    If I have understand you correctly, you are trying to replace NA values in a row with the latest non-NA value in the same row

    We can use na.locf with fromLast set as TRUE

    t(apply(df, 1, function(x) na.locf(x, fromLast = T, na.rm = F)))
    
    
    #             c1 c2    c3    c4    c5
    #VEG_PUFF     12 12 78.43 78.43 78.43
    #CHICKEN_PUFF 16 16 88.24 88.24    NA
    #BAKERY_Total 28 28 28.00 84.04 84.04
    
    0 讨论(0)
  • 2020-12-21 19:43

    We can use na.omit

    t(apply(df, 1, na.omit))
    #             [,1]  [,2]
    #VEG PUFF       12 78.43
    #CHICKEN PUFF   16 88.24
    #BAKERY Total   28 84.04
    

    Update

    Based on the excel data showed

    lst <- apply(df, 1, na.omit)
    df2 <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
    row.names(df2) <- row.names(df)
    

    Or another option is melt/dcast from data.table

    library(data.table)
    dcast(melt(setDT(df1, keep.rownames=TRUE), id.var = 'rn', 
             na.rm = TRUE), rn~ paste0("c", rowid(rn)), value.var = "value")
    #             rn c1    c2  c3
    #1: BAKERY Total 28 84.04  NA
    #2: CHICKEN PUFF 16 88.24 143
    #3:     VEG PUFF 12 78.43  NA
    

    To provide a reproducible example,

    df1 <- structure(list(c1 = c(NA, NA, NA), c2 = c(12L, 16L, NA), c3 = c(NA, 
    NA, 28L), c4 = c(NA, 88.24, NA), c5 = c(78.43, 143, 84.04)), .Names = c("c1", 
    "c2", "c3", "c4", "c5"), class = "data.frame", row.names = c("VEG PUFF", 
    "CHICKEN PUFF", "BAKERY Total"))
    
    lst <- lapply(seq_len(nrow(df1)), function(i) {
                   x1 <- unlist(df1[i,])
                   x1[complete.cases(x1)]})
    df2 <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
    row.names(df2) <- row.names(df1)
    

    The above approach is similar to the apply method except that we can be always sure that this output a list (in the apply - it can vary. When the number of elements are the same after removing the NA, it will output a matrix, in other cases a list). So, we loop over the sequence of rows, remove the NA elements, pad NA at the end to make lengths of list elements same and then rbind


    Or another option is which with arr.ind=TRUE

    ind <- which(!is.na(df), arr.ind=TRUE)
    matrix(df[ind[order(ind[,1]),]], ncol=2, byrow=TRUE, 
                dimnames = list(row.names(df), paste0("c", 1:2)))
    #             c1    c2
    #VEG PUFF     12 78.43
    #CHICKEN PUFF 16 88.24
    #BAKERY Total 28 84.04
    
    0 讨论(0)
提交回复
热议问题