does git store diff information in commit objects?

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

According to this:

It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.

Yet when I run git show $SHA1ofCommitObject...

commit 4405aa474fff8247607d0bf599e054173da84113 Author: Joe Smoe <joe.smoe@example.com> Date:   Tue May 1 08:48:21 2012 -0500      First commit  diff --git a/index.html b/index.html new file mode 100644 index 0000000..de8b69b --- /dev/null +++ b/index.html @@ -0,0 +1 @@ +<h1>Hello World!</h1> diff --git a/interests/chess.html b/interests/chess.html new file mode 100644 index 0000000..e5be7dd --- /dev/null +++ b/interests/chess.html @@ -0,0 +1 @@ +Did you see on Slashdot that King's Gambit accepted is solved! <a href="http://game 

... it outputs the diff of the commit with the previous commits. I know that git doesn't store diffs in blob objects, but does it store diffs in commit objects? Or is git show dynamically calculating the diff?

回答1:

No, commit objects in git don't contain diffs - instead, each commit object contains a hash of the tree, which recursively and completely defines the content of the source tree at that commit. There's a nice explanation in the git community book of what goes into blob objects, tree objects and commit objects .

All the diffs that are shown to you by git's tools are calculated on demand from the complete content of files.



回答2:

What the statement means is that, most other version control systems need a point of reference in the past to be able to re-create the current commit.

For example, at some point in the past, a diff-based VCS (version control system) would have stored a full snapshot:

x = snapshot + = diff History: x-----+-----+-----+-----(+) Where we are now 

So, in such a scenario, to re-create the state at (now), it would have to checkout (x) and then apply diffs for each (+) until it gets to now. Note that it would extremely inefficient to store deltas forever, so every so often, delta based VCSes store a full snapshot. Here's how its done for subversion.

Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot.

So if that is the case, then where does the delta compression that git uses come in?

Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. The thing is, though, that Git does not do this immediately after every commit, but rather on a garbage collection run. So, if git has not run its garbage collection, you can see objects in your index with very similar content.

However, when Git runs its garbage collection (or when you call git gc manually), then the duplicates are cleaned up and a read only pack file is created. You don't have to worry about running garbage collection manually - git contains heuristics which tell it when to do so.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!