Git repository internal format explained

前端 未结 1 2134
礼貌的吻别
礼貌的吻别 2021-02-04 08:18

Is there any documentation on how Git stores files in his repository? I\'m try to search over the Internet, but no usable results. Maybe I\'m using incorrect query or maybe this

1条回答
  •  灰色年华
    2021-02-04 08:56

    The internal format of the repository is extremely simple. Git is in essence a user space file system that's content addressable.

    Here's a thumbnail sketch.

    Objects

    Git stores its internal data structures as objects. There are four kinds of objects: blobs (sort of like files), trees (sort of like directories), commits (snapshots of the file system at particular points in time along with information on how to reach there) and tags (pointers to commits useful for marking important ones).

    If you look inside the .git directory of a repository, you'll find an objects directory that contains files named by the SHA-1 hash. Each of them represents an object. You can inspect them using plumbing git cat-file command. An example commit object from one of my repositories

    noufal@sanitarium% git cat-file -p 7347affffd901afc7d237a3e9c9512c9b0d05c6cf7
    tree c45d8922787a3f801c0253b1644ef6933d79fd4a
    parent 4ee56fbe52912d3b21b3577b4a82849045e9ff3f
    author Noufal Ibrahim  1322165467 +0530
    committer Noufal Ibrahim  1322165467 +0530
    
    Added a .md extension to README
    

    You can also see the the object itself at .git/objects/73/47affffd901afc7d237a3e9c9512c9b0d05c6cf7.

    You can examine other objects like this. Each commit points to a tree representing the file system at that point in time and has one (or more in case of merge commits) parent.

    Objects are stored as single files in the objects directory. These are called loose objects. When you run git gc, objects that can no longer be reached are pruned and the remaining are packed together into a a single file and delta compressed. This is more space efficient and compacts the repository. After you run gc, you can look at the .git/objects/pack/ directory to see git packfiles. To unpack them, you can use the plumbing command git unpack-objects command. The .git/objects/info/packs file contains a list of packfiles that are currently present.

    References

    The next thing you need to know is what references are. These are pointers to certain commits or objects. Your branches and other such things are implemented as references. There are two kinds "real" (which are like hard links in a file system) and "symbolic" (which are pointers to real references - like symbolic links).

    These are located in the .git/refs directory. For example, in the above repository, I'm on the master branch. My latest commit is

    noufal@sanitarium% git log -1
    commit 7347affffd901afc7d237a3e9c9512c9b0d05c6cf7
    Author: Noufal Ibrahim 
    Date:   Fri Nov 25 01:41:07 2011 +0530
    
        Added a .md extension to README
    

    You can see that my master reference located at .git/refs/heads/master points to this commit.

    noufal@sanitarium% more .git/refs/heads/master
    7347affffd901afc7d237a3e9c9512c9b0d05c6cf7
    

    The current branch is stored in the symbolic reference HEAD located at .git/HEAD. Here it is

    noufal@sanitarium% more .git/HEAD
    ref: refs/heads/master
    

    It will change if you switch branches.

    Similarly, tags are references like this too (but they are not movable unlike branches).

    The entire repository is managed using just a DAG of commits (each of which points to a tree representing the files at a point in time) and references that point to various commits on the DAG so that you can manipulate them.

    Further reading

    • I have a presentation which I use for my git trainings up here that explains some of this.
    • The community book at http://book.git-scm.com/ has some sections on the internals.
    • Scott Chacon's Pro Git book has a section on internals
    • He also has a peepcode PDF just about the internals.

    0 讨论(0)
提交回复
热议问题