Identifying specific differences between two data sets in R

后端未结

关注

 4  1562

长发绾君心 2021-02-14 05:52

I would like to compare two data sets and identify specific instances of discrepancies between them (i.e., which variables were different).

While I have found out how t

4条回答

礼貌的吻别 (楼主)

2021-02-14 06:00

One possibility. First, find out which ids both datasets have in common. The simplest way to do this is:

commonID<-intersect(A$id,B$id)

Then you can determine which rows are missing from A by:

> B[!B$id %in% commonID,]
#       id       name      dob vaccinedate vaccinename dose
# 3 100002 John Smith 2/5/2010   7/13/2013        HEPB    3

Next, you can restrict both datasets to the ids they have in common.

Acommon<-A[A$id %in% commonID,]
Bcommon<-B[B$id %in% commonID,]

If you can't assume that the id's are in the right order, then sort them both:

Acommon<-Acommon[order(Acommon$id),]
Bcommon<-Bcommon[order(Bcommon$id),]

Now you can see what fields are different like this.

diffs<-Acommon != Bcommon
diffs
#      id  name   dob vaccinedate vaccinename  dose
# 1 FALSE FALSE FALSE       FALSE       FALSE  TRUE
# 2 FALSE  TRUE FALSE        TRUE       FALSE FALSE

This is a logical matrix, and you can do whatever you want with it. For example, to find the total number of errors in each column:

colSums(diffs)
#         id        name         dob vaccinedate vaccinename        dose 
#          0           1           0           1           0           1

To find all ids where the name is different:

Acommon$id[diffs[,"name"]]
# [1] 100001

And so on.

0 讨论(0)

查看其它4个回答