I have a large dataset, over 1.5 million rows, from 600k unique subjects, so a number of subjects have multiple rows. I am trying to find the cases where the one of the subj
One approach using plyr:
plyr
library(plyr) zz <- ddply(test, "ID", summarise, dups = length(unique(DOB))) zz[zz$dups > 1 ,]
And if base R is your thing, using aggregate()
aggregate()
zzz <- aggregate(DOB ~ ID, data = test, FUN = function(x) length(unique(x))) zzz[zzz$DOB > 1 ,]