How does Git record (or more likely, represent) file paths and names for its blobs, and then identify renames?

后端 未结 2 1646
心在旅途
心在旅途 2021-01-12 08:01

I\'m trying to get my head around the way that git manages to \'remember\' a file\'s name and its path, given that it only stores file content within a blob. Is the explanat

相关标签:
2条回答
  • 2021-01-12 08:20

    git defines four kinds of objects (Commit, Tag, Tree, Blob). Each object is identified after the hash of its content.

    The three objects that are involved with renaming are :

    1. blob: this correspond to a committed file, the content of the object is the compressed content of the original file

    2. tree: this correspond to a directory listing, it contains a mapping of filename to other objects (either blobs or trees) and also record the access rigths

    3. commit: this contains the commit message, a pointer to the parent commit(s) (except for the first commit), and to a tree object

    So when you rename a file and commit it, a new tree object is created (well, and more than one if it is in a subdirectory) with a new mapping name to object, but the object is the same.

    However, git does not track rename, it try to rediscover them by comparing file content. If two file are really similar, but have different names, it consider it is a rename. This can be time consuming, and if there are lots of file, it can fail.

    Edit: Take a look to the Git Community Book, that has a really good explanation on how does git store information.

    0 讨论(0)
  • 2021-01-12 08:29

    Why does git not "track" renames?

    Git has to interoperate with a lot of different workflows, for example some changes can come from patches, where rename information may not be available. Relying on explicit rename tracking makes it impossible to merge two trees that have done exactly the same thing, except one did it as a patch (create/delete) and one did it using some other heuristic.

    On a second note, tracking renames is really just a special case of tracking how content moves in the tree. In some cases, you may instead be interested in querying when a function was added or moved to a different file. By only relying on the ability to recreate this information when needed, Git aims to provide a more flexible way to track how your tree is changing.

    However, this does not mean that Git has no support for renames. The diff machinery in Git has support for automatically detecting renames, this is turned on by the '-M' switch to the git-diff-* family of commands. The rename detection machinery is used by git-log(1) and git-whatchanged(1), so for example, 'git log -M' will give the commit history with rename information. Git also supports a limited form of merging across renames. The two tools for assigning blame, git-blame(1) and git-annotate(1) both use the automatic rename detection code to track renames.

    As a very special case, 'git log' version 1.5.3 and later has '--follow' option that allows you to follow renames when given a single path. Mail by Linus on this topic.

    Git has a rename command git mv, but that is just for convenience. The effect is indistinguishable from removing the file and adding another with different name and the same content.

    I am surprised that no one has linked to Pro Git book. Much of my learnings are from that.

    Also, if you can get the book Version Control with Git, do it. It is a very good book, especially for beginners.
    Here is the link - Version Control with Git.
    There is also Git from bottom up.

    0 讨论(0)
提交回复
热议问题