Take this sample data:
data.frame(a_1=c(\"Apple\",\"Grapes\",\"Melon\",\"Peach\"),a_2=c(\"Nuts\",\"Kiwi\",\"Lime\",\"Honey\"),a_3=c(\"Plum\",\"Apple\",NA,NA)
There's no need for regex here. Just use apply
+ tail
+ na.omit
:
> apply(mydf, 1, function(x) tail(na.omit(x), 1))
[1] "Cucumber" "Apple" "Lime" "Honey"
I don't know how this compares in terms of speed, but you You can also use a combination of "data.table" and "reshape2", like this:
library(data.table)
library(reshape2)
na.omit(melt(as.data.table(mydf, keep.rownames = TRUE),
id.vars = "rn"))[, value[.N], by = rn]
# rn V1
# 1: 1 Cucumber
# 2: 2 Apple
# 3: 3 Lime
# 4: 4 Honey
Or, even better:
melt(as.data.table(df, keep.rownames = TRUE),
id.vars = "rn", na.rm = TRUE)[, value[.N], by = rn]
# rn V1
# 1: 1 Cucumber
# 2: 2 Apple
# 3: 3 Lime
# 4: 4 Honey
This would be much faster. On an 800k-row dataset, apply
took ~ 50 seconds while the data.table
approach took about 2.5 seconds.