Get Value of last non-empty column for each row

前端 未结 3 1818
盖世英雄少女心
盖世英雄少女心 2021-01-05 21:27

Take this sample data:

data.frame(a_1=c(\"Apple\",\"Grapes\",\"Melon\",\"Peach\"),a_2=c(\"Nuts\",\"Kiwi\",\"Lime\",\"Honey\"),a_3=c(\"Plum\",\"Apple\",NA,NA)         


        
3条回答
  •  时光说笑
    2021-01-05 22:03

    There's no need for regex here. Just use apply + tail + na.omit:

    > apply(mydf, 1, function(x) tail(na.omit(x), 1))
    [1] "Cucumber" "Apple"    "Lime"     "Honey" 
    

    I don't know how this compares in terms of speed, but you You can also use a combination of "data.table" and "reshape2", like this:

    library(data.table)
    library(reshape2)
    na.omit(melt(as.data.table(mydf, keep.rownames = TRUE), 
                 id.vars = "rn"))[, value[.N], by = rn]
    #    rn       V1
    # 1:  1 Cucumber
    # 2:  2    Apple
    # 3:  3     Lime
    # 4:  4    Honey
    

    Or, even better:

    melt(as.data.table(df, keep.rownames = TRUE), 
         id.vars = "rn", na.rm = TRUE)[, value[.N], by = rn]
    #    rn       V1
    # 1:  1 Cucumber
    # 2:  2    Apple
    # 3:  3     Lime
    # 4:  4    Honey
    

    This would be much faster. On an 800k-row dataset, apply took ~ 50 seconds while the data.table approach took about 2.5 seconds.

提交回复
热议问题