I would like to compare two data sets and identify specific instances of discrepancies between them (i.e., which variables were different).
While I have found out how t
There is a new package call waldo
install.packages("waldo")
library(waldo)
# construct the data frames
df1 <- structure(list(id = 100000:100001, name = structure(c(2L, 1L), .Label = c("Jane Doe","John Doe"), class = "factor"), dob = structure(1:2, .Label = c("1/1/2000", "7/3/2011"), class = "factor"), vaccinedate = structure(c(2L, 1L), .Label = c("3/14/2013", "5/20/2012"), class = "factor"), vaccinename = structure(1:2, .Label = c("MMR", "VARICELLA"), class = "factor"), dose = c(4L, 1L)), .Names = c("id", "name", "dob", "vaccinedate", "vaccinename", "dose"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(id = 100000:100002, name = structure(c(2L, 1L, 3L), .Label = c("Jane Doee", "John Doe", "John Smith"), class = "factor"), dob = structure(c(1L, 3L, 2L), .Label = c("1/1/2000", "2/5/2010", "7/3/2011"), class = "factor"), vaccinedate = structure(c(2L, 1L, 3L), .Label = c("3/24/2013", "5/20/2012", "7/13/2013"), class = "factor"), vaccinename = structure(c(2L, 3L, 1L), .Label = c("HEPB", "MMR", "VARICELLA"), class = "factor"), dose = c(3L, 1L, 3L)), .Names = c("id", "name", "dob", "vaccinedate", "vaccinename", "dose"), class = "data.frame", row.names = c(NA, -3L))
# compare them
compare(df1,df2)
And we get:
`old` is length 2
`new` is length 3
`names(old)`: "X" "Y"
`names(new)`: "X" "Y" "Z"
`attr(old, 'row.names')`: 1 2 3
`attr(new, 'row.names')`: 1 2 3 4
`old$X`: 1 2 3
`new$X`: 1 2 3 4
`old$Y`: "a" "b" "c"
`new$Y`: "A" "b" "c" "d"
`old$Z` is absent
`new$Z` is a character vector ('k', 'l', 'm', 'n')