Line-end agnostic diff?

后端 未结 7 1725
醉梦人生
醉梦人生 2021-01-01 14:19

I\'m working on a Mac, with some fairly old files. Different files were created by different programs, so some of them end with \\r (Mac) and some with \\n (Unix). I want to

相关标签:
7条回答
  • 2021-01-01 14:21

    The dos2unix command could be helpful in converting your files to a consistent format first. I believe it's available for just about every platform you can think of and can run on lots of files at once. I believe there's a package available for Mac.

    0 讨论(0)
  • 2021-01-01 14:30

    As Jay said, Diff'nPatch seems what you are looking for. Alternatively you can convert all your '\r' line endings in '\n' in a single command like this:

    sed -ie 's/\r/\n/' filename
    

    or

    find . | xargs -n1 sed -ie 's/\r/\n/'
    

    (You may want to filter the list of files in some way in the latter case or it will be applied to all the files in all subdirectories.)

    0 讨论(0)
  • 2021-01-01 14:30

    This worked for me:

    diff -r --ignore-all-space dir1/ dir2/
    

    I am on OS X and have mixed files from OS X and Windows.

    Credit: http://www.codealpha.net/514/diff-and-ignoring-spaces-and-end-of-lines-unix-dos-eol/

    0 讨论(0)
  • 2021-01-01 14:38

    The diff utility bundled with OS X v10.7 (Lion) has an option 'strip-trailing-cr' that does that you want. You use it like so:

    diff -cpt a.c b.c --strip-trailing-cr
    
    0 讨论(0)
  • 2021-01-01 14:42

    If you use diff -w it will ignore whitespace in the files, which is probably sufficient for your needs.

    EDIT: just realized I misread the post the first time and you're actually looking for a diff that will work with \r line endings. My suggestion would be to convert the files with something like flip that can convert the files to a \n standard format.

    EDIT 2: Just found something that looks like what you want - Diff'nPatch:

    Diff'nPatch is a port to the Macintosh of the GNU 'diff', 'patch' and 'cmp' utilities. It lets you compare and find differences between two files or folders, collate two files, generate diffs in various formats (normal, context, unidiff, etc.), apply patches, compare files byte by byte. It can handle any type of line endings (mac, unix or windows)

    0 讨论(0)
  • 2021-01-01 14:43

    PhpStorm's diff view's "ignore whitespace" just works. It automatically ignores differences in the carriage return / EOL / newline / what-have-you. You can waste your time fiddling with arcane Unix commands or whatever, or you could just get something that actually works and move forward with life.

    • Using any of the above-mentioned solutions failed on OS X v10.8 (Mountain Lion) (including the one marked as the correct answer). All the download links for "Diff-npatch" failed. (I did find http://webperso.easyconnect.fr/bdesgraupes/tools.html, but I really don't like the idea of having to resort to using a diff tool that cannot be invoked from the command line and thus integrated with whatever IDE or version control system tool I might be using, like BBEdit, Sourcetree, or SmartSVN -- all of which, BTW, failed to ignore newlines with their built-in diff tool.

    Yes, my newlines are \r, but so what? Arrr! If the software is too stupid to realize that \r == \n then I'm just going to use different software that is smart enough.

    PhpStorm was the only software that had a diff tool that "just worked" -- which is what I expect Mac software to do. I expect Mac software to just work. I use a Mac, so I can do my job instead of learning arcane terminal commands at every turn, which are almost all poorly documented, expecting you to just understand how the commands are supposed to be formatted without any clear examples, so you never know if you're doing it wrong or if the command simply doesn't work just like all other bad software.

    Take this example from "man diff":

       -I RE  --ignore-matching-lines=RE
              Ignore changes whose lines all match RE.
    

    OK, so having read this, I have no idea what it means. There is no example of its usage. What is "RE"? It doesn't say anywhere.

    Then there's this jewel:

      --GTYPE-group-format=GFMT
              Similar, but format GTYPE input groups with GFMT.
    
       --line-format=LFMT
              Similar, but format all input lines with LFMT.
    
       --LTYPE-line-format=LFMT
              Similar, but format LTYPE input lines with LFMT.
    
       LTYPE is `old', `new', or `unchanged'.
              GTYPE is LTYPE or `changed'.
    
              GFMT may contain:
    
       %<     lines from FILE1
    
       %>     lines from FILE2
    
       %=     lines common to FILE1 and FILE2
    
       %[-][WIDTH][.[PREC]]{doxX}LETTER
              printf-style spec for LETTER
    
              LETTERs are as follows for new group, lower case for old group:
    
       F      first line number
    
       L      last line number
    
       N      number of lines = L-F+1
    
       E      F-1
    
       M      L+1
    
              LFMT may contain:
    
       %L     contents of line
    
       %l     contents of line, excluding any trailing newline
    
       %[-][WIDTH][.[PREC]]{doxX}n
              printf-style spec for input line number
    
              Either GFMT or LFMT may contain:
    
       %%     %
    
       %c'C'  the single character C
    
       %c'\OOO'
              the character with octal code OOO
    

    I could make no sense whatsoever of this passage. What is the "input"? Is it both files or just the "to" file or just the "from" file? What is "similar" referring to? What does "is" mean in the sentence, "GFMT 'is' LTYPE or `changed'"? Does it mean "may be replaced by"? If so then why isn't "GFMT" in quotations, brackets, etc.? Since no example is given, there is no way to know; the documentation's wording is totally ambiguous. What does "GFMT may contain"... mean? Does "contain" mean that the text replacing the acronym GFMT may contain that? Without a clear example it's completely useless.

    Why even bother to write a man page if you're going to make it so cryptic and ambiguous it's useless to anyone who doesn't already know how to use the software, basically? At that point, it's not a manual; it's just a quick-reference page for the guys who wrote the software so they can remember how to use it. I guess they assume you'll just read the source-code itself if you want to know what it actually does.

    My time is valuable. I'd rather just pay the money to have a piece of software that actually works correctly and has proper documentation.

    Because these all failed:

     diff -d --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml
    

    ...failed to ignore \r characters.

     diff -wd --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml
    

    ...failed to ignore \r characters.

     diff -wd --suppress-common-lines --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml
    

    ...failed to ignore \r characters.

     diff -wd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space
    

    ...failed to ignore \r characters.

     diff -awd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space
    

    ...failed to ignore \r characters.

    For that matter if they were \n characters it also failed when the \n characters are added.

    Where test.phtml ==

    foo

    bar

    and rest.html ==

    foobar

    The "diff" command always gives you something like:


    *** 1,2 **** ! foo ! bar \ No newline at end of file

    --- 1 ---- ! foobar \ No newline at end of file

    ... fail!

    0 讨论(0)
提交回复
热议问题