Identifying specific differences between two data sets in R

后端 未结 4 1557
长发绾君心
长发绾君心 2021-02-14 05:52

I would like to compare two data sets and identify specific instances of discrepancies between them (i.e., which variables were different).

While I have found out how t

4条回答
  •  孤城傲影
    2021-02-14 06:18

    There is a new package call waldo

    install.packages("waldo")
    library(waldo)
    
    # construct the data frames
    
    
    df1 <- structure(list(id = 100000:100001, name = structure(c(2L, 1L), .Label = c("Jane Doe","John Doe"), class = "factor"), dob = structure(1:2, .Label = c("1/1/2000", "7/3/2011"), class = "factor"), vaccinedate = structure(c(2L, 1L), .Label = c("3/14/2013", "5/20/2012"), class = "factor"), vaccinename = structure(1:2, .Label = c("MMR", "VARICELLA"), class = "factor"), dose = c(4L, 1L)), .Names = c("id", "name", "dob", "vaccinedate", "vaccinename", "dose"), class = "data.frame", row.names = c(NA, -2L))
    
    df2 <- structure(list(id = 100000:100002, name = structure(c(2L, 1L, 3L), .Label = c("Jane Doee", "John Doe", "John Smith"), class = "factor"), dob = structure(c(1L, 3L, 2L), .Label = c("1/1/2000", "2/5/2010", "7/3/2011"), class = "factor"), vaccinedate = structure(c(2L, 1L, 3L), .Label = c("3/24/2013", "5/20/2012", "7/13/2013"), class = "factor"), vaccinename = structure(c(2L, 3L, 1L), .Label = c("HEPB", "MMR", "VARICELLA"), class = "factor"), dose = c(3L, 1L, 3L)), .Names = c("id", "name", "dob", "vaccinedate", "vaccinename", "dose"), class = "data.frame", row.names = c(NA, -3L))
    
    # compare them
    compare(df1,df2)
    

    And we get:

    `old` is length 2
    `new` is length 3
    
    `names(old)`: "X" "Y"    
    `names(new)`: "X" "Y" "Z"
    
    `attr(old, 'row.names')`: 1 2 3  
    `attr(new, 'row.names')`: 1 2 3 4
    
    `old$X`: 1 2 3  
    `new$X`: 1 2 3 4
    
    `old$Y`: "a" "b" "c"    
    `new$Y`: "A" "b" "c" "d"
    
    `old$Z` is absent
    `new$Z` is a character vector ('k', 'l', 'm', 'n')
    

提交回复
热议问题