Determining different rows between two data sets in R

…衆ロ難τιáo~ 提交于 2019-12-08 11:59:05

问题


I have two data files in tab separated CSV format. The files are in the following format:

EP Code    EP Name    Address    Region    ...
101654    Alpha     York Street    Northwest    ...
103628    Beta    5th Avenue    South    ...

EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.

For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.

How can I do that in R?


回答1:


There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:

diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]

or

diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]


来源:https://stackoverflow.com/questions/3132778/determining-different-rows-between-two-data-sets-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!