How to identify “similar” rows in R?

≡放荡痞女 提交于 2020-01-06 02:05:07

问题


Hi to all the members of the community, I am trying to find out how to compare elements of my DB and identify them by a new binary variable. My DB is like this:

id=rep((1:2),5)
date<-seq(from=as.Date("2013-01-1"),to=as.Date("2013-01-05"),by=1)
trap<-c(1,1,3,1,4,2,3,4,1,4)
DB<-data.frame(id,date,trap)
DB<-DB[order(DB$date),]
DB$id[2]<-1
DB$trap[2]<-1
result<-c("N","N","N","N","N","N","Y","Y","Y","Y")
DB<-cbind(DB,result)

and I want to identify all the elements for which the id is different, but the date and trap value is the same, as report in the column result.

I have tried some codes (basically derived from another "similar" question) with the ave function, but unsuccessfully. As always any tips will be appreciate!!


回答1:


(duplicated(DB[,-1]) | duplicated(DB[,-1],fromLast=TRUE)) & 
         !(duplicated(DB) | duplicated(DB,fromLast=TRUE))
#[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE



回答2:


You could make a double loop over your data frame:

apply(DB, 1, function(r){
  if(any(apply(DB, 1, function(x)(x[1]!= r[1] & all(x[c(2,3)]==r[c(2,3)])))))
    "Y"
  else
    "N"
})        

gives:

  1   6   2   7   3   8   4   9   5  10 
"N" "N" "N" "N" "N" "N" "Y" "Y" "Y" "Y" 


来源:https://stackoverflow.com/questions/16563165/how-to-identify-similar-rows-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!