Is there any documentation on how Git stores files in his repository? I\'m try to search over the Internet, but no usable results. Maybe I\'m using incorrect query or maybe this
The internal format of the repository is extremely simple. Git is in essence a user space file system that's content addressable.
Here's a thumbnail sketch.
Git stores its internal data structures as objects. There are four kinds of objects: blobs (sort of like files), trees (sort of like directories), commits (snapshots of the file system at particular points in time along with information on how to reach there) and tags (pointers to commits useful for marking important ones).
If you look inside the .git
directory of a repository, you'll find an objects
directory that contains files named by the SHA-1 hash. Each of them represents an object. You can inspect them using plumbing git cat-file
command. An example commit object from one of my repositories
noufal@sanitarium% git cat-file -p 7347affffd901afc7d237a3e9c9512c9b0d05c6cf7
tree c45d8922787a3f801c0253b1644ef6933d79fd4a
parent 4ee56fbe52912d3b21b3577b4a82849045e9ff3f
author Noufal Ibrahim 1322165467 +0530
committer Noufal Ibrahim 1322165467 +0530
Added a .md extension to README
You can also see the the object itself at .git/objects/73/47affffd901afc7d237a3e9c9512c9b0d05c6cf7
.
You can examine other objects like this. Each commit points to a tree representing the file system at that point in time and has one (or more in case of merge commits) parent.
Objects are stored as single files in the objects
directory. These are called loose objects. When you run git gc
, objects that can no longer be reached are pruned and the remaining are packed together into a a single file and delta compressed. This is more space efficient and compacts the repository. After you run gc, you can look at the .git/objects/pack/
directory to see git packfiles. To unpack them, you can use the plumbing command git unpack-objects
command. The .git/objects/info/packs
file contains a list of packfiles that are currently present.
The next thing you need to know is what references are. These are pointers to certain commits or objects. Your branches and other such things are implemented as references. There are two kinds "real" (which are like hard links in a file system) and "symbolic" (which are pointers to real references - like symbolic links).
These are located in the .git/refs
directory. For example, in the above repository, I'm on the master
branch. My latest commit is
noufal@sanitarium% git log -1
commit 7347affffd901afc7d237a3e9c9512c9b0d05c6cf7
Author: Noufal Ibrahim
Date: Fri Nov 25 01:41:07 2011 +0530
Added a .md extension to README
You can see that my master
reference located at .git/refs/heads/master
points to this commit.
noufal@sanitarium% more .git/refs/heads/master
7347affffd901afc7d237a3e9c9512c9b0d05c6cf7
The current branch is stored in the symbolic reference HEAD
located at .git/HEAD
. Here it is
noufal@sanitarium% more .git/HEAD
ref: refs/heads/master
It will change if you switch branches.
Similarly, tags are references like this too (but they are not movable unlike branches).
The entire repository is managed using just a DAG of commits (each of which points to a tree representing the files at a point in time) and references that point to various commits on the DAG so that you can manipulate them.