Why Isn't There A Git Clone Specific Commit Option?

前端 未结 4 2009
一向
一向 2020-12-02 17:25

In light of a recent question on SO, I am wondering why isn\'t there an option in git clone such that the HEAD pointer of the newly created branch will point to

相关标签:
4条回答
  • 2020-12-02 17:52

    Cloning a repo is a different operation than checkout. You don't "clone a specific commit". For convenience you can clone and then checkout a particular pre-existing branch at the same time, since that is what most people want. If that doesn't meet your needs (no branch for the particular SHA you want) just use or alias some form of

    git clone -n <some repo> && cd <some repo> && git checkout SHA
    
    0 讨论(0)
  • 2020-12-02 17:55

    As the other answers say, this is typically not much of an issue, but they don't say why you can't clone a specific commit. The answer is security.

    If you accidentally push confidential information, and then force-push a fixed history, the commits with the confidential information will still be stored on the server, until the server's Git's garbage collector finds it is no longer needed. If the hash is known (it might for example be available in logs), a malicious user might request the specific commit that shouldn't have been pushed, even if you were able to verify that when you force-pushed the fixed history, nobody had fetched those commits yet.

    Making sure you can only clone from refs makes sure that only "reachable" commits will be sent to the clients.

    0 讨论(0)
  • 2020-12-02 18:05

    Two answers so far (at the time I wrote this, now there are more) are correct in what they say, but don't really answer the "why" question. Of course, the "why" question is really hard to answer, except by the authors of the various bits of Git (and even then, what if two frequent Git contributors gave two different answers?).

    Still, considering Git's "philosophy" as it were, in general, the various transfer protocols work by naming a reference. If they provide an SHA-1, it's the SHA-1 of that reference. For someone who does not already have direct (e.g., command-line) access to the repository, none1 of the built in commands allow one to refer to commits by ID. The closest thing I can find to a reason for this—and it is actually a good reason2—is this bit in the git upload-archive documentation:

    SECURITY

    In order to protect the privacy of objects that have been removed from history but may not yet have been pruned, git-upload-archive avoids serving archives for commits and trees that are not reachable from the repository's refs. However, because calculating object reachability is computationally expensive, git-upload-archive implements a stricter but easier-to-check set of rules ...

    However, it goes on to say:

    If the config option uploadArchive.allowUnreachable is true, these rules are ignored, and clients may use arbitrary sha1 expressions. This is useful if you do not care about the privacy of unreachable objects, or if your object database is already publicly available for access via non-smart-http.

    which is particularly interesting since git clone gets all reachable objects in the first place, after which your local clone could trivially check out a commit by SHA-1 ID (and create a local branch name pointing to that ID if desired, or just leave your clone in "detached HEAD" mode).

    Given these two cross-currents, I think the real answer to "why", at this point, is "nobody cares enough to add it". :-) The privacy argument is valid, but there is no reason that git clone could not check out a commit by ID after cloning, just as it can be told to check out some branch other than master3 with git clone -b .... The only drawback to allowing -b sha1 is that Git cannot check up front (before the cloning process begins) whether sha1 will be received. It can check reference names, since those are transferred (along with their branch tips or other SHA-1 values) up front, so git clone -b nonexistentbranch ssh://... terminates quickly and does not create the copy:

    fatal: Remote branch nonexistentbranch not found in upstream origin
    fatal: The remote end hung up unexpectedly
    

    If -b allowed an ID, you'd get the whole clone, then it would have to tell you: "oh gosh, sorry, can't check out that ID, I'll leave you on master instead" or whatever. (Which is more or less what happens now with a busted submodule.)


    1While git upload-archive now enforces this "privacy" rule, this was not always the case (it was introduced in version 1.7.8.1); and many (most?) git-web servers, including the one distributed with Git itself, allow viewing by arbitrary ID. This is probably why allowUnreachable was added to upload-archive a few years after the "only by ref name" code was added (but note that releases of Git after 1.7.8 and before 2.0.0 have no way to loosen the rules). Hence, while the "security" idea is valid, there was a period (pre 1.7.8.1) when it was not enforced.

    2There are numerous ways to "leak" ostensibly private data out of a Git repository. A new file, Documentation/transfer-data-leaks, is about to appear in Git 2.11.1, while Git 2.11.0 added some internal features (see commit 722ff7f87 among others) to immediately drop objects pushed but not accepted. Such objects are eventually garbage-collected, but that leaves them exposed for the duration.

    3Actually, by default git clone makes a local check-out of the branch it thinks goes with the remote's HEAD reference. Usually that's master anyway, though.

    0 讨论(0)
  • 2020-12-02 18:10

    If your specific commit is referenced by a branch, you can do a:

    git clone -b yourBranch /url/of/the/repo
    

    The cloned repo will be directly at the commit referenced by that branch.

    0 讨论(0)
提交回复
热议问题