I would like to compare two data sets and identify specific instances of discrepancies between them (i.e., which variables were different).
While I have found out how t
One possibility. First, find out which ids both datasets have in common. The simplest way to do this is:
commonID<-intersect(A$id,B$id)
Then you can determine which rows are missing from A by:
> B[!B$id %in% commonID,]
# id name dob vaccinedate vaccinename dose
# 3 100002 John Smith 2/5/2010 7/13/2013 HEPB 3
Next, you can restrict both datasets to the ids they have in common.
Acommon<-A[A$id %in% commonID,]
Bcommon<-B[B$id %in% commonID,]
If you can't assume that the id's are in the right order, then sort them both:
Acommon<-Acommon[order(Acommon$id),]
Bcommon<-Bcommon[order(Bcommon$id),]
Now you can see what fields are different like this.
diffs<-Acommon != Bcommon
diffs
# id name dob vaccinedate vaccinename dose
# 1 FALSE FALSE FALSE FALSE FALSE TRUE
# 2 FALSE TRUE FALSE TRUE FALSE FALSE
This is a logical matrix, and you can do whatever you want with it. For example, to find the total number of errors in each column:
colSums(diffs)
# id name dob vaccinedate vaccinename dose
# 0 1 0 1 0 1
To find all ids where the name is different:
Acommon$id[diffs[,"name"]]
# [1] 100001
And so on.