How to make git understand Mac (CR) line endings

前端 未结 2 487
谎友^
谎友^ 2021-01-19 15:12

For some reasons one of my files contains old style Mac line endings (after editing on OSX). These are \"CR\" (carriage return) characters and show up as ^M in git dif

相关标签:
2条回答
  • 2021-01-19 15:50

    TL;DR

    Create a filter driver plus .gitattributes: create a smudge filter that runs tr '\n' '\r' and a clean filter that runs tr '\r' '\n', and mark the file(s) in question as using this filter. Store the file inside Git using LF-only line endings. (The filter driver is defined in a .git/config or $HOME/.gitconfig file and the names or name-patterns for the files go in .gitattributes.)

    Long

    As you have seen, Git strongly prefers newline-terminated lines. (It can work with newline-separated lines, where the last line is missing the terminator, but this means that adding a line results in a change to the previous final line, since it now has a newline terminator while the new final line is missing the newline terminator.) This does not matter for the individual snapshots, but does matter for producing useful diffs.

    Modern MacOS uses newlines, like everyone else. Only ancient backwards-compatible formats have CR-only line endings. See, e.g., this SuperUser Stack Exchange web site posting.

    Git does not have a built in filter for converting to or from such line endings. Git does, however, have a general purpose mechanism for making alterations in work-tree files.

    Remember that when Git stores any file in a snapshot, the file is represented by what Git calls a blob object, which is stored internally in a special, compressed (sometimes highly compressed), Git-only form. This form is not useful to anything but Git, so when you get the files in a useful form—via git checkout, for instance—Git expands them into their usual form for your computer. Meanwhile, any time you take a normal file like this and convert it to Git-only form, Git compresses the file down to its Git-only form. That happens whenever you copy a file back into Git's index using git add.

    The index copy of each file exists while you have the work-tree in place, just like the committed copy. The index copy is in the same Git-only format. The key difference here is that the committed copy can't be changed, but the index copy can be changed. Running git commit takes a snapshot of whatever is in the index right at that point, and makes that the new snapshot for the new commit. Hence the index acts as what will go into the next commit. Using git checkout, you copy some existing commit into the index, and have Git expand it into the work-tree; then using git add, you selectively replace particular index copies with compressed versions of the work-tree files that you have changed.

    This copying, to or from index and work-tree, is the ideal point at which to do Windows-style LF-to-CRLF conversions, or vice versa, so this is where Git does it. If you have some other conversion to perform, not directly built in to Git, this is where you tell Git to do it.

    Smudge and clean filters

    A smudge filter is one that Git applies when converting a file from compressed index copy to work-tree copy. Here, if you've chosen to have newline characters replaced with CRLF Windows-style line enders-or-separators, Git has an internal converter that will do that: eol=crlf. A clean filter is one that Git applies when converting a file from uncompressed work-tree copy to compressed index copy; here again, eol=crlf directs Git to do the backwards conversion.

    If you want to replace newlines with CR-only, you must invent your own converters. Let's say you call the overall process convert-cr:

    *.csv   filter=convert-cr
    

    (instead of *.csv eol=crlf). This line goes into .gitattributes (which is a commit-able file, and you should commit it).

    Now you must define the convert-cr filter. This goes in a Git configuration file, and here we find a minor flaw: the configuration file is not commit-able. This is a security issue: Git will run arbitrary commands here, and if I could commit this file and you clone it, you'll run the commands I specify, without getting a chance to vet them first. So you must put this into your .git/config yourself, or into your global configuration (git config --global --edit for instance):

    [filter "convert-cr"]
        clean = tr '\r' '\n'
        smudge = tr '\n' '\r'
    

    Now whenever Git converts from Git-only format, it will translate the newlines to CRs, and whenever Git converts to Git-only format, it will translate the CRs to newlines.

    This does not help with existing stored files

    Any existing snapshots that you have today that have \r inside them, are stored that way forever. Git will never change any existing stored file! Stored data are precious and inviolate. There is nothing you can do about this. Well, there is almost nothing: you can throw out those commits entirely, making new and improved commits that you use instead. But that's quite painful: every commit remembers its parent commits, so if you replace an early commit in your repository, you must replace every child, grandchild, and so on, so that they all remember this new sequence of commits. (git filter-branch does this job.)

    You can, however, instruct Git about how to diff particular files in existing commits, also using .gitattributes and diff drivers. There are multiple ways to do this, but the simplest is to define a textconv attribute, that turns a "binary" file—such as a file whose stored version might have CR-only characters—into a text (line-oriented, i.e., newline-based) file. The textconv filter to use here is precisely the same as the smudge filter.

    For further details, see the gitattributes documentation.

    0 讨论(0)
  • 2021-01-19 16:06

    Since the accepted answer, a new way to do this has been introduced.

    You can teach git diff and git log to run the file through a special command before creating the diff. This is a one-way process, which is just used for generating diffs, and doesn't affect how the files are stored on disk or in your repository.

    Create a new diff driver called "cr", which runs the file through tr before calculating the diff. In your .git/config:

    [diff "cr"]
        textconv = tr '\\r' '\\n' <
    

    Alternatively:

    git config diff.cr.textconv "tr '\r' '\n' <"
    

    Then tell git to use it using your .gitattributes (e.g. for all .csv files):

    *.csv diff=cr
    

    Note that this only affects diffs. It won't help you with merging!

    0 讨论(0)
提交回复
热议问题