In R, find whether two files differ

前端 未结 5 1052
春和景丽
春和景丽 2021-01-04 07:50

I would like a pure R way to test whether two arbitrary files are different. So, the equivalent to diff -q in Unix, but should work on Windows and without exter

相关标签:
5条回答
  • 2021-01-04 08:30

    Example solution: (Using all.equals utility from: https://stat.ethz.ch/R-manual/R-devel/library/base/html/all.equal.html)

    filenameForA <- "my_file_A.txt"
    filenameForB <- "my_file_B.txt"
    all.equal(readLines(filenameForA), readLines(filenameForB))
    

    Note, that

    readLines(filename)
    

    reads all the lines from given file specified by filename, then all.equal can figure out if the files differ or not.

    Make sure to read the documentation from above to understand fully. I've to admit, that if the files are very large, this might not be the best option.

    0 讨论(0)
  • 2021-01-04 08:33

    Without using memory, if the files are too large:

    library(tools)
    md5sum("file_1.txt") == md5sum("file_2.txt")
    
    0 讨论(0)
  • 2021-01-04 08:34

    I realize this is not exactly what you're asking for, but I post it for the benefit of others who run into this question wanting to see the full diff and willing to tolerate external dependencies. In that case, diffobj will show them to you with a real diff that works on windows, with the same algorithm as GNU diff. In this example, we compare the Moby Dick text to a version of it with 5 lines modified:

    library(diffobj)
    diffFile(mob.1.txt, mob.2.txt)   # or `diffChr` if you data in R already
    

    Produces:

    If you want something faster while still getting the locations of the differences you can get the shortest edit script, from the same package:

    ses(readLines(mob.1.txt), readLines(mob.2.txt))
    # [1] "1127c1127"   "2435c2435"   "6417c6417"   "13919c13919"
    

    Code to get the Moby Dick data (note I didn't set seed, so you'll get different lines):

    moby.dick.url <- 'http://www.gutenberg.org/files/2701/2701-0.txt'
    moby.dick.raw <- moby.dick.UC <- readLines(moby.dick.url)
    to.UC <- sample(length(moby.dick.raw), 5)
    moby.dick.UC[to.UC] <- toupper(moby.dick.UC[to.UC])
    
    mob.1.txt <- tempfile()
    mob.2.txt <- tempfile()
    
    writeLines(moby.dick.raw, mob.1.txt)
    writeLines(moby.dick.UC, mob.2.txt)
    
    0 讨论(0)
  • 2021-01-04 08:46

    the closest to the unix command is diffr - it shows a really nice side by side window with all the different lines marked in color.

    library(diffr)
    diffr(filename1, filename2)
    

    shows

    0 讨论(0)
  • 2021-01-04 08:49
    all.equal(readLines(f1), readLines(f2))
    
    0 讨论(0)
提交回复
热议问题