How can I calculate git tree hash?

后端 未结 1 1729
北荒
北荒 2021-02-19 07:28

For a nodejs project I need to determinate the hash of my folder to check the version. Actually, I made a script to test my code (without filesystem, directly of git api for my

相关标签:
1条回答
  • 2021-02-19 07:49

    Your code contains a problem, however fixing it doesn't remove the discrepancy in the A2 case.

    The problem with your code

    The official git algorithm for computing hashes on trees drops leading zeros from the mode field. In your examples, that field contains values 100644 and 040000, and the latter is recorded by git as 40000.

    Proof:

    $ git cat-file tree 4ef57de8e81c8415d6da2b267872e602b1f28cfe|hexdump -C
    00000000  31 30 30 36 34 34 20 2e  63 6f 76 65 72 61 67 65  |100644 .coverage|
    00000010  72 63 00 44 91 70 d0 fa  eb 75 18 23 10 34 55 64  |rc.D.p...u.#.4Ud|
    00000020  fd 18 11 c0 b9 fd 73 31  30 30 36 34 34 20 2e 65  |......s100644 .e|
    00000030  64 69 74 6f 72 63 6f 6e  66 69 67 00 75 88 49 36  |ditorconfig.u.I6|
    00000040  ea 2d 35 b5 31 af 88 6a  ca d7 47 d4 fd 9b 2a 9e  |.-5.1..j..G...*.|
    00000050  31 30 30 36 34 34 20 2e  66 6c 61 6b 65 38 00 69  |100644 .flake8.i|
    00000060  e8 72 e3 0d 30 f5 c7 de  32 76 d2 89 d6 ae e8 1c  |.r..0...2v......|
    00000070  cf 4a f7 34 30 30 30 30  20 2e 67 69 74 68 75 62  |.J.40000 .github|
    ...                                                             ^^^^^
    ...                                                             !!!!!
    

    But adding removal of leading zeros1 to your perl script still doesn't fix the A2 case (although the computed hash changes, it is still different from the expected one):

    $ cat main.sh
    XX="$(perl -sane '$F[2] =~ s/(..)/\\x$1/g ; $F[0] =~ s/^0+//g ; print $F[0]." ".$F[1]."\\"."x00".$F[2]' output_a1)"
    SIZE=$(echo -en "$XX" | wc -c)
    
    echo "original: 8d66139b3acf78fa50e16383693a161c33b5e048 output:"
    echo -en "tree $SIZE\x00$XX" | sha1sum
    
    XX="$(perl -sane '$F[2] =~ s/(..)/\\x$1/g ; $F[0] =~ s/^0+//g ; print $F[0]." ".$F[1]."\\"."x00".$F[2]' output_a2)"
    SIZE=$(echo -en "$XX" | wc -c)
    
    echo "original: 4ef57de8e81c8415d6da2b267872e602b1f28cfe output:"
    echo -en "tree $SIZE\x00$XX" | sha1sum
    
    $ ./main.sh 
    original: 8d66139b3acf78fa50e16383693a161c33b5e048 output:
    8d66139b3acf78fa50e16383693a161c33b5e048  -
    original: 4ef57de8e81c8415d6da2b267872e602b1f28cfe output:
    c5c701b8114582e3bb2e353aac157a7febfcd33b  -
    

    The problem (?) with the GitHub API

    The explanation is that the A2 hash 4ef57de8e81c8415d6da2b267872e602b1f28cfe points to a commit object rather than a tree. That commit object in turn refers to the tree with hash c5c701b8114582e3bb2e353aac157a7febfcd33b, which is exactly the value computed by the fixed code:

    $ git cat-file -t 4ef57de8e81c8415d6da2b267872e602b1f28cfe
    commit
    
    $ git cat-file -p 4ef57de8e81c8415d6da2b267872e602b1f28cfe
    tree c5c701b8114582e3bb2e353aac157a7febfcd33b
    parent 502a88b41161ec7dbff0862e3d805db397caf366
    ...
    

    Had you used for the A2 query the tree rather than the commit hash (try this) you wouldn't have had any problem in the first place.

    An arguable issue with the GitHub API is that it silently resolves a commit hash to the underlying tree instead of returning an error or including in the response an indication of what happened (for example, by setting the sha field to the hash of the tree rather than the query value).


    1 The quick&dirty fix won't work correctly in one case, when the mode field consists of only zeros. In that case the mode field will be completely erased instead of being replaced by a single zero. However that case cannot occur in practice, since an object with such a mode value would be simply inaccessible to git.

    0 讨论(0)
提交回复
热议问题