Why did git push so much data?

后端 未结 2 1202
囚心锁ツ
囚心锁ツ 2020-12-29 07:19

I\'m wondering about what git is doing when it pushes up changes, and why it seems to occasionally push way more data than the changes I\'ve made. I made some changes to two

2条回答
  •  孤城傲影
    2020-12-29 07:50

    I just realized that there is very realistic scenario which can result in unusually big push.

    What objects push does send? Which do not yet exist on server. Or, rather which it did not detect as existing. How does it check object existence? In the beginning of push, server sends references (branches and tags) which is has. So, for example, if they have following commits:

      CLIENT                                     SERVER
     (foo) -----------> aaaaa1
                          |
     (origin/master) -> aaaaa0                (master) -> aaaaa0
                          |                                 |
                         ...                               ...
    

    Then client will get the something like /refs/heads/master aaaaa0, and find that it has to send only what is new in commit aaaaa1.

    But, if somebody has pushed anything to remote master, it is different:

      CLIENT                                     SERVER
     (foo) -----------> aaaaa1                      (master) --> aaaaa2
                          |                                       /
     (origin/master) -> aaaaa0                                 aaaaa0
                          |                                      |
                         ...                                    ...
    

    Here, client gets refs/heads/master aaaaa2, but it does not know anything about aaaaa2, so it cannot deduce that aaaaa0 exists on the server. So, in this simple case of only 2 branches the whole history will be sent instead of only incremental one.

    This is unlikely to happen in grown up, being actively developed, project, which has tags and many branches some of which become stale and are not updated. So users might be sending a bit more, but it does not become that big difference as in your case, and goes unspotted. But in very small teams it can happen more often and the difference would be significant.

    To avoid it, you could run git fetch before push. Then, in my example, the aaaaa2 commit would already exist at client and git push foo would know that it should not send aaaaa0 and older history.

    Read here for the push implementation in protocol.

    PS: the recent git commit graph feature might help with it, but I have not tried it.

提交回复
热议问题