问题
I guess I just don't see it, but all the similar thing I found on the Net, in the Mailinglist archives or the FAQ could not really elucidate my issue.
The closest I have found was this: apply strsplit rowwise
I have a df, with two character columns and one numerical column. Filled like this:
df=data.frame(name1=c("A","B","C","D"),
name2=c("B","A","D","C"),
nums=c(1,1,4,4),
stringsAsFactors=F)
Now I would like to find the unique rows in this, however, only based on the two name columns. And for those columns, the order of the columns has no significance, thus i can not use duplicated
, if I understood it correctly.
So I thought about combining the two name columns row wise, make a rowwise sorting, and print out a paste
of the vector (length=2 in combination with something like sapply
).
However I did not get it to work.
So far, I used a for loop, but this takes ages on the original data.
for(i in 1:length(df$name1)){
mysort=sort(c(df$name1[i],df$name2[i]))
df$combname[i]=paste(mysort[1],mysort[2])
}
Any suggestions are welcome. Maybe I just understand unique
and sapply
in a wrong way.
回答1:
Solution without for loop.
df$combname <- apply(df[1:2], 1, function(x) paste(sort(x), collapse=""))
回答2:
Perhaps you should explore the "data.table" package. Here's one approach:
library(data.table)
DT <- data.table(df)
DT[, new := paste(sort(c(name1, name2)), collapse = ""), by = 1:nrow(DT)]
DT
# name1 name2 nums new
# 1: A B 1 AB
# 2: B A 1 AB
# 3: C D 4 CD
# 4: D C 4 CD
DT[!duplicated(new), ]
# name1 name2 nums new
# 1: A B 1 AB
# 2: C D 4 CD
来源:https://stackoverflow.com/questions/19062699/apply-strsplit-rowwise-including-sort-and-nested-paste