`git` shows changed files after cloning, without any other actions

对着背影说爱祢 提交于 2019-12-06 04:19:16

This happens precisely because those files are committed with CRLF endings, yet the .gitattributes file says to commit them with LF-only endings.

Git can and will do CRLF-vs-LF-only conversion in two places:

  • During extraction from index to work-tree. A file stored in a commit or in the index is always assumed to be in a "clean" state, but when extracting that file from the index, to the work-tree, Git should apply any conversions directed by .gitattributes in the form of "change LF-only to CRLF", for instance, and also in the form of what Git calls smudge filters.

  • During the copy of a file from work-tree back to index. A file stored in the work-tree is in the "smudged" state, so at this point, Git should apply any "cleaning" conversions: for instance, change CR-LF to LF-only, and applying clean filters.

Note that there are two points at which these conversions can occur. This does not mean that they will occur at both points, just that these are the two possible places. As the .gitattributes documentation notes, the actual conversions are:

  • eol=lf: none on index -> work-tree; CR-LF to LF-only on work-tree -> index
  • eol=crlf: LF-only to CR-LF on index -> work-tree; none on work-tree -> index

Now, a file that's actually in the repository, stored in a commit, is purely read-only. It can never change inside that commit. More precisely, the commit identifies (by hash ID) a tree that identifies (by hash ID) a blob that has whatever contents it has. These hash IDs are themselves crytographic checksums of the object contents, so they are naturally all read-only: if we try to change the contents, what we get is instead a new, different object with a new, different hash ID.

Because git checkout actually works by coping the raw hash IDs from the commit's tree(s) to the index, the versions of files stored in the index are necessarily identical to those stored in the commit.

Hence, if somehow—regardless of the how—the committed files are in a form that disagrees with what .gitattributes directs Git to do, the files will become "dirty" in the work-tree regardless of the fact that you haven't done anything to them! If you were to git add the three files in question, that would copy them from work-tree to index, and hence delete the carriage-returns from their line endings. Hence they are, in git status terms, modified but not yet staged for commit.

Stripping out the carriage returns in the work-tree versions leaves them in the same state: they're modified with respect to what's in the index, because git add will now leave their LF-only line endings unchanged, producing new, different files that are in the index.

A more interesting question is: How did they get into the commit(s) in the wrong state? This is not something we can answer: only those who made those commits can produce that answer. We can only speculate. One way to achieve this is to add and commit the files without a .gitattributes in effect, then to set the .gitattributes into effect without git add-ing the files again. This way, the CR-LF endings get into someone's index and hence get into that user's commits, even though the .gitattributes file now says (but did not earlier say) that any new git add should strip away the carriage returns.

Changing core.autocrlf has no effect on the status of these files

It should, but only after cloning again:

git config --global core.autocrlf false

git clone git@github.com:erocarrera/pydot pydot2
cd pydot2
git status

That would desactivate core.autocrlf globally, but this is just for testing here.

Thanks to @torek for the explanation (which agrees with my conjecture).

In summary, the asymmetric git configuration leads to commit(checkout(Index)) not being the identity mapping. With CRLF in the index, this particular configuration checked out CRLF, but after the input transformations in effect (eol=lf), git would commit LF instead of CRLF.

The root cause of this confusion was comparing the:

  • file I see in the working directory, with the
  • committed file.

This doesn't show whether the file has changed. What one should compare is what git will commit after applying the input transformations with what is already committed. Clearly, if those two items differ, then the file has changed.

Following this reasoning, one could declare the repository "unstable", in that it regards itself as modified in absence of interaction with the world. This supports avoiding this state by changing the committed files to LF, or changing the .gitattributes (I prefer committing LF).

In this situation, git would commit LF for both LF and CRLF in the working directory, so dos2unix and unix2dos would had no effect on the commit outcome, thus neither to the file's status.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!