问题
(Related question that does not include sorting. It's easy to just use paste
when you don't need to sort.)
I have a less-than-ideally-structured table with character columns that are generic "item1","item2" etc. I would like to create a new character variable that is the alphabetized, comma-separated concatenation of these columns. So for example, in row 5, if item1 = "milk", item2 = "eggs", and item3 = "butter", the new variable in row 5 might be "butter, eggs, milk"
I wrote a function f()
below that works on two character variables. However, I am having trouble
- Using
mapply
or other "vectorization" (I know it's really just a for loop) - Generalizing the function to an arbitrary number of columns
Any help much appreciated.
df <- data.frame(a =c("foo","bar"),
b= c("baz","qux"))
paste(df$a,df$b, sep=", ")
# returns [1] "foo, baz" "bar, qux" ... but I want [1] "baz, foo" "bar, qux"
f <- function(a,b) paste(c(a,b)[order(c(a,b))],collapse=", ")
f("foo","baz")
# returns [1] "baz, foo" ... which is what I want ... how to vectorize?
df$new_var <- mapply(f, df$a, df$b)
df
# a b new_var <- new_var is not what I want
# 1 foo baz 1, 2
# 2 bar qux 1, 2
# Interestingly, data.table is smart enough to fix my bad mapply
library(data.table)
dt <- data.table(a =c("foo","bar"),
b= c("baz","qux"))
dt[,new_var:=mapply(f, a, b)]
dt
# a b new_var <- new var IS what I want
# 1: foo baz baz, foo
# 2: bar qux bar, qux
回答1:
My first thought would've been to do this:
dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]
But you could make your function work with a couple of simple modifications:
f = function(...) paste(c(...)[order(c(...))],collapse=", ")
dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]
回答2:
Just apply down rows:
apply(df,1,function(x){
paste(sort(x),collapse = ",")
})
Wrap it in a function if you want. You'll either have to define which columns to send or assume all. i.e. apply(df[ ,2:3],1,f()...
sort(x) is the same as x[order(x)]
来源:https://stackoverflow.com/questions/28730186/row-wise-sort-then-concatenate-across-specific-columns-of-data-frame