问题
I have two data files in tab separated CSV format. The files are in the following format:
EP Code EP Name Address Region ...
101654 Alpha York Street Northwest ...
103628 Beta 5th Avenue South ...
EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.
For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows
. I want to determine the differences between two data sets. I'm not interested in the mutual rows.
How can I do that in R?
回答1:
There are many ways to do this, including setdiff
, intersect
, the %in%
function, is.element
. Just find the intersecting set and exclude it using !
:
diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]
or
diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]
来源:https://stackoverflow.com/questions/3132778/determining-different-rows-between-two-data-sets-in-r