I am confused with how a file's actual contents are stored in .git.
For e.g. Version 1
is the actual text content in test.txt
. When I commit (first commit) it to the repo, git returns a SHA-1 for that file which is located in .git\objects\0c\15af113a95643d7c244332b0e0b287184cd049
.
When I open the file 15af113a95643d7c244332b0e0b287184cd049
in a text editor, it's all garbage, something like this
x+)JMU074f040031QÐKÏ,ÉLÏË/Je¨}ºõw[Éœ„ÇR ñ·Î}úyGª*±8#³¨,1%>9?¯$5¯D¯¤¢„áôÏ3%³þú>š~}Ž÷*ë²-¶ç¡êÊòR“KâKòãs+‹sô
But I'm not sure whether this garbage represents the encrypted form of the text Version 1
or it's represented by the SHA-1 15af113a95643d7c244332b0e0b287184cd049
.
The correct answer to the question in the subject line:
Git objects SHA-1 are file contents or file names?
is probably "neither", since you were referring to the contents of the loose object file, rather than the original file—and even if you were referring to the original file, that's still not quite right.
A loose object, in Git, is a plain file. The name of the file is constructed from the object's hash ID. The object's hash ID, in turn, is constructed by computing a hash of the object's contents with a prefix header attached.
The prefixed header depends on the object type. There are four types: blob
, commit
, tag
, and tree
. The header consists of the a zero-terminated byte string composed of the type name as an ASCII (or equivalently, UTF-8) byte string, followed by a space, followed by a decimalized representation of the size of the object in bytes, followed by an ASCII NUL (b'\x00'
in Python, if you prefer modern Python notation, or '\0'
if you prefer C).
After the header come the actual object contents. So, for a file containing the byte string b'hello\n'
, the data to be hashed consist of b'blob 6\0hello\n
:
$ echo 'hello' | git hash-object -t blob --stdin
ce013625030ba8dba906f756967f9e9ca394464a
$ python3
[...]
>>> import hashlib
>>> s = b'blob 6\0hello\n'
>>> hashlib.sha1(s).hexdigest()
'ce013625030ba8dba906f756967f9e9ca394464a'
Hence, the file name that would be used to store this file is (derived from) ce013625030ba8dba906f756967f9e9ca394464a
. As a loose object, it becomes .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
.
The contents of that file, however, are the zlib-compressed form of b'blob 6\0hello\n'
(with, apparently, level=1
—the default is currently 6 and the result does not match at that level; it's not clear whether Git's zlib deflate exactly matches Python's, but using level 1 did work here):
$ echo 'hello' | git hash-object -w -t blob --stdin
ce013625030ba8dba906f756967f9e9ca394464a
$ vis .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
x\^AK\M-J\M-IOR0c\M-HH\M-M\M-I\M-I\M-g\^B\000\^]\M-E\^D\^T$
(note that the final $
is the shell prompt again; now back to Python3)
>>> import zlib
>>> zlib.compress(s, 1)
b'x\x01K\xca\xc9OR0c\xc8H\xcd\xc9\xc9\xe7\x02\x00\x1d\xc5\x04\x14'
>>> import vis
>>> print(vis.vis(zlib.compress(s, 1)))
x\^AK\M-J\M-IOR0c\M-HH\M-M\M-I\M-I\M-g\^B\^@\^]\M-E\^D\^T
where vis.py
is:
def vischr(byte):
"encode characters the way vis(1) does by default"
if byte in b' \t\n':
return chr(byte)
# control chars: \^X; del: \^?
if byte < 32 or byte == 127:
return r'\^' + chr(byte ^ 64)
# printable characters, 32..126
if byte < 128:
return chr(byte)
# meta characters: prefix with \M^ or \M-
byte -= 128
if byte < 32 or byte == 127:
return r'\M^' + chr(byte ^ 64)
return r'\M-' + chr(byte)
def vis(bytestr):
"same as vis(1)"
return ''.join(vischr(c) for c in bytestr)
(vis
produces an invertible but printable encoding of binary files; it was my 1993-ish answer to problems with cat -v
).
Note that the names of files stored in a Git repository (under a commit) appear only as path name components stored in individual tree
objects. Computing the hash ID of a tree object is nontrivial; I have Python code that does this in my public "scripts" repository under githash.py.
Git Magic mentions:
By the way, the files within .git/objects are compressed with zlib so you should not stare at them directly. Filter them through
zpipe -d
, or type (usinggit cat-file
):
$ git cat-file -p .git/objects/0c/15af113a95643d7c244332b0e0b287184cd049
With zpipe
:
$ ./zpipe -d < .git/objects/0c/15af113a95643d7c244332b0e0b287184cd049
Note: for zpipe, I had to compile zpipe.c
first:
sudo apt-get install zlib1g-dev
cd /usr/share/doc/zlib1g-dev/examples
sudo gunzip zpipe.c.gz
sudo gcc -o zpipe zpipe.c -lz
Then:
$ /usr/share/doc/zlib1g-dev/examples/zpipe -d < /usr/share/doc/zlib1g-dev/examples/zpipe -d <
You will get a result like:
vonc@VONCAVN7:/mnt/d/git/seec$ /usr/share/doc/zlib1g-dev/examples/zpipe -d < .git/objects/0d/b6225927ef60e21138a9762c41ea0db714ca0d
blob 2142 <full content there...>
You see a header composed of the type and content size, followed by the actual content.
See "Understanding Git Internals" from Jeff Kunkle, slide 8, for an illustration of a blob actual content:
来源:https://stackoverflow.com/questions/44475891/git-objects-sha-1-are-file-contents-or-file-names