How does Git's transfer protocol work

前端 未结 4 2044
陌清茗
陌清茗 2020-11-29 12:04

I am working with Git for more than one year and now I have to explain it to others in our group. That is why I need a bit more backround. I went thourgh most of the Git Boo

4条回答
  •  有刺的猬
    2020-11-29 12:36

    Another aspect of the git transfer protocol is in its packet management, including ACKs when requesting "HAVE":

    Before Git 2.27 (Q2 2020), the server-end of the v2 protocol to serve "git clone" and "git fetch" was not prepared to see a delim packets at unexpected places, which led to a crash.

    See commit cacae43 (29 Mar 2020), and commit 4845b77, commit 88124ab (27 Mar 2020) by Jeff King (peff).
    (Merged by Junio C Hamano -- gitster -- in commit 5ee5788, 22 Apr 2020)

    upload-pack: handle unexpected delim packets

    Signed-off-by: Jeff King

    When processing the arguments list for a v2 ls-refs or fetch command, we loop like this:

    while (packet_reader_read(request) != PACKET_READ_FLUSH) {
            const char *arg = request->line;
     ...handle arg...
    }
    

    to read and handle packets until we see a flush. The hidden assumption here is that anything except PACKET_READ_FLUSH will give us valid packet data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF will leave >packet->line as NULL, and we'll segfault trying to look at it.

    Instead, we should follow the more careful model demonstrated on the client side (e.g., in process_capabilities_v2): keep looping as long as we get normal packets, and then make sure that we broke out of the loop due to a real flush. That fixes the segfault and correctly diagnoses any unexpected input from the client.


    Before Git 2.27 (Q2 2020), the upload-pack protocol v2 gave up too early before finding a common ancestor, resulting in a wasteful fetch from a fork of a project.

    This has been corrected to match the behaviour of v0 protocol.

    See commit 2f0a093, commit 4fa3f00, commit d1185aa (28 Apr 2020) by Jonathan Tan (jhowtan).
    (Merged by Junio C Hamano -- gitster -- in commit 0b07eec, 01 May 2020)

    fetch-pack: in protocol v2, in_vain only after ACK

    Signed-off-by: Jonathan Tan
    Reviewed-by: Jonathan Nieder

    When fetching, Git stops negotiation when it has sent at least MAX_IN_VAIN (which is 256) "have" lines without having any of them ACK-ed.
    But this is supposed to trigger only after the first ACK, as pack-protocol.txt says:

    However, the 256 limit only turns on in the canonical client implementation if we have received at least one "ACK %s continue" during a prior round. This helps to ensure that at least one common ancestor is found before we give up entirely.

    The code path for protocol v0 observes this, but not protocol v2, resulting in shorter negotiation rounds but significantly larger packfiles.
    Teach the code path for protocol v2 to check this criterion only after at least one ACK was received.


    As a result of the work in 2.27 (where v2 was not the default), v2 is again the default with 2.28.

    See commit 3697caf:

    config: let feature.experimental imply protocol.version=2

    Git 2.26 used protocol v2 as its default protocol, but soon after release, users noticed that the protocol v2 negotiation code was prone to fail when fetching from some remotes that are far ahead of others (such as linux-next.git versus Linus's linux.git).
    That has been fixed by 0b07eec (Merge branch 'jt/v2-fetch-nego-fix', 2020-05-01, Git v2.27.0-rc0), but to be cautious, we are using protocol v0 as the default in 2.27 to buy some time for any other unanticipated issues to surface.

    To that end, let's ensure that users requesting the bleeding edge using the feature.experimental flag do get protocol v2.
    This way, we can gain experience with a wider audience for the new protocol version and be more confident when it is time to enable it by default for all users in some future Git version.

    Implementation note: this isn't with the rest of the feature.experimental options in repo-settings.c because those are tied to a repository object, whereas this code path is used for operations like "git ls-remote" that do not require a repository.


    With Git 2.28 (Q3 2020), the "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire.

    See commit cae2ee1 (15 Jun 2020) by Ramsay Jones (``).
    See commit dd4b732, commit 9da69a6, commit acaaca7, commit cd8402e, commit fd194dd, commit 8d5d2a3, commit 8e6adb6, commit eb05349, commit 9cb3cab (10 Jun 2020) by Jonathan Tan (jhowtan).
    (Merged by Junio C Hamano -- gitster -- in commit 34e849b, 25 Jun 2020)

    fetch-pack: support more than one pack lockfile

    Signed-off-by: Jonathan Tan

    Whenever a fetch results in a packfile being downloaded, a .keep file is generated, so that the packfile can be preserved (from, say, a running "git repack") until refs are written referring to the contents of the packfile.

    In a subsequent patch, a successful fetch using protocol v2 may result in more than one .keep file being generated. Therefore, teach fetch_pack() and the transport mechanism to support multiple .keep files.

    Implementation notes:

    • builtin/fetch-pack.c normally does not generate .keep files, and thus is unaffected by this or future changes.
      However, it has an undocumented "--lock-pack" feature, used by remote-curl.c when implementing the "fetch" remote helper command.
      In keeping with the remote helper protocol, only one "lock" line will ever be written; the rest will result in warnings to stderr.
      However, in practice, warnings will never be written because the remote-curl.c "fetch" is only used for protocol v0/v1 (which will not generate multiple .keep files). (Protocol v2 uses the "stateless-connect" command, not the "fetch" command.)

    • connected.c has an optimization in that connectivity checks on a ref need not be done if the target object is in a pack known to be self-contained and connected. If there are multiple packfiles, this optimization can no longer be done.

    Cf. Packfile URIs

    This feature allows servers to serve part of their packfile response as URIs. This allows server designs that improve scalability in bandwidth and CPU usage (for example, by serving some data through a CDN), and (in the future) provides some measure of resumability to clients.

    This feature is available only in protocol version 2.


    "git fetch --depth=(man) " over the stateless RPC / smart HTTP transport handled EOF from the client poorly at the server end.

    This is fixed, as part of the transport protocol, in Git 2.30 (Q1 2021).

    See commit fb3d1a0 (30 Oct 2020) by Daniel Duvall (marxarelli).
    (Merged by Junio C Hamano -- gitster -- in commit d1169be, 18 Nov 2020)

    upload-pack: allow stateless client EOF just prior to haves

    Signed-off-by: Daniel Duvall

    During stateless packfile negotiation where a depth is given, stateless RPC clients (e.g. git-remote-curl) will send multiple upload-pack requests with the first containing only the wants/shallows/deepens/filters and the subsequent containing haves/done.

    When upload-pack handles such requests, entering get_common_commits without checking whether the client has hung up can result in unexpected EOF during the negotiation loop and a die() with message "fatal: the remote end hung up unexpectedly".

    Real world effects include:

    • A client speaking to git-http-backend via a server that doesn't check the exit codes of CGIs (e.g. mod_cgi) doesn't know and doesn't care about the fatal. It continues to process the response body as normal.
    • A client speaking to a server that does check the exit code and returns an errant HTTP status as a result will fail with the message "error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500."
    • Admins running servers that surface the failure must workaround it by patching code that handles execution of git-http-backend to ignore exit codes or take other heuristic approaches.
    • Admins may have to deal with "hung up unexpectedly" log spam related to the failures even in cases where the exit code isn't surfaced as an HTTP server-side error status.

    To avoid these EOF related fatals, have upload-pack gently peek for an EOF between the sending of shallow/unshallow lines (followed by flush) and the reading of client haves.
    If the client has hung up at this point, exit normally.

提交回复
热议问题