I want to convert the given dataframe from
c1 c2 c3 c4 c5
VEG PUFF 12 78.43
CHICKEN PUFF &l
Update
As there was lot of confusion on the expected output, updating the answer as suggested by @DavidArenburg using a tidyverse
solution
library(dplyr)
library(tidyr)
df %>%
add_rownames() %>%
gather(variable, value, -rowname) %>%
filter(!is.na(value)) %>%
group_by(rowname) %>%
mutate(indx = row_number()) %>%
select(-variable) %>%
spread(indx, value)
# rowname `1` `2`
#* <chr> <dbl> <dbl>
#1 BAKERY_Total 28 84.04
#2 CHICKEN_PUFF 16 88.24
#3 VEG_PUFF 12 78.43
Another solution could be
library(data.table)
temp <- apply(df, 1, function(x) data.frame(matrix(x[!is.na(x)], nrow = 1)))
rbindlist(temp, fill = T)
Previous Answer
If I have understand you correctly, you are trying to replace NA
values in a row with the latest non-NA value in the same row
We can use na.locf
with fromLast
set as TRUE
t(apply(df, 1, function(x) na.locf(x, fromLast = T, na.rm = F)))
# c1 c2 c3 c4 c5
#VEG_PUFF 12 12 78.43 78.43 78.43
#CHICKEN_PUFF 16 16 88.24 88.24 NA
#BAKERY_Total 28 28 28.00 84.04 84.04
We can use na.omit
t(apply(df, 1, na.omit))
# [,1] [,2]
#VEG PUFF 12 78.43
#CHICKEN PUFF 16 88.24
#BAKERY Total 28 84.04
Based on the excel data showed
lst <- apply(df, 1, na.omit)
df2 <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
row.names(df2) <- row.names(df)
Or another option is melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df1, keep.rownames=TRUE), id.var = 'rn',
na.rm = TRUE), rn~ paste0("c", rowid(rn)), value.var = "value")
# rn c1 c2 c3
#1: BAKERY Total 28 84.04 NA
#2: CHICKEN PUFF 16 88.24 143
#3: VEG PUFF 12 78.43 NA
To provide a reproducible example,
df1 <- structure(list(c1 = c(NA, NA, NA), c2 = c(12L, 16L, NA), c3 = c(NA,
NA, 28L), c4 = c(NA, 88.24, NA), c5 = c(78.43, 143, 84.04)), .Names = c("c1",
"c2", "c3", "c4", "c5"), class = "data.frame", row.names = c("VEG PUFF",
"CHICKEN PUFF", "BAKERY Total"))
lst <- lapply(seq_len(nrow(df1)), function(i) {
x1 <- unlist(df1[i,])
x1[complete.cases(x1)]})
df2 <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
row.names(df2) <- row.names(df1)
The above approach is similar to the apply
method except that we can be always sure that this output a list
(in the apply
- it can vary. When the number of elements are the same after removing the NA, it will output a matrix
, in other cases a list
). So, we loop over the sequence of rows, remove the NA
elements, pad NA
at the end to make lengths of list
elements same and then rbind
Or another option is which
with arr.ind=TRUE
ind <- which(!is.na(df), arr.ind=TRUE)
matrix(df[ind[order(ind[,1]),]], ncol=2, byrow=TRUE,
dimnames = list(row.names(df), paste0("c", 1:2)))
# c1 c2
#VEG PUFF 12 78.43
#CHICKEN PUFF 16 88.24
#BAKERY Total 28 84.04