need clarification on pulling git branches

前端 未结 2 1518
南笙
南笙 2021-01-21 12:53

I always struggle with pulling git branches and never get it right. I am a solo user here. My work flow is create master and dev-stage1, dev-stage2, push code to origin and then

2条回答
  •  逝去的感伤
    2021-01-21 13:28

    I am a solo user here.

    I assume this means that you work on your own repositories only. However, you then say:

    git remote set-url path/to/remote/repo

    which suggests that you would like to co-ordinate with others, which contradicts the "solo user" claim.

    Meanwhile, let's start with the basics.

    Version control, repositories, and work-trees

    When you use any version control system (VCS), you are declaring an interest in, well, controlling versions. That is, you want to keep, and be able to access, older versions of various files. To do this, we need to store each saved version of each saved file somewhere. The place in which these versions are saved is a repository.

    Some version control systems operate on individual files. Git does not: Git stores commits, which are whole sets of files at a time. The unit of revisioning, or versioning, is the commit. If commits were simply numbered sequentially (though they're not in Git), the very first commit, commit #0 or #1 depending on how we count, might have a dozen files in it. Each subsequent commit also has all those files (plus any you added, minus any you removed). Telling the VCS get me version 3 means "go back in time to when I saved version #3, and get all those files."

    To make this work, a commit-oriented VCS needs a work tree (or hyphenated, or "working tree", or any number of similar variations on this theme). In this work tree, you have your files. If you extract an old version, you get all the files the way they were as of that version. If you jump to the latest ("head" of a branch) version, you get all the latest files. Meanwhile you can also change the files in the work tree, to do work on them. Eventually you will tell the VCS to save the new work-tree as a new commit. (Git adds several wrinkles here.)

    Git's commits and Git-style branches

    Different VCSes have different ways of dealing with branches. Git's is quite unusual. Git's branches are formed by Git's commits. Each commit in Git—in fact, every object that Git stores inside the repository, although you will mostly only see these for commits—has a unique ID that Git assigns, through some deep magic,1 that is an incomprehensible (and usually unpronounceable) string of digits and letters: 1f93ca2395be0f98... or some such.

    We already mentioned that a commit stores a snapshot of a work-tree, as it was at the time of the commit. (Git's stores instead a snapshot of Git's index, but we'll leave that for another posting entirely.)

    In Git, each commit has not only this work-tree snapshot, but also:

    • the committer: the name and email address of the person who made the commit, with a time-stamp
    • the author: the name and email address of the person who authored the files (usually the same as the committer but one can email patches around and hence get them to be separate), with a time-stamp
    • a log message, which is something the committer writes to describe the commit to themselves and others looking back at it later
    • the identity of a parent commit

    The parent commit is the hash ID of the commit that comes before this new commit. That is, if we start with a completely empty repository and make a first commit, we might draw it like this:

    A
    

    (using a single uppercase letter instead of an incomprehensible hash ID—we'll run out after just 26 commits!).

    Now when we go to make a new commit, it looks like this:

    A <-B
    

    We say that the new commit B "points to" the first commit A. Since A was the very first commit, it doesn't point anywhere: it has no parent at all. It can't; it was the first commit. The technical term for this is that A is a root commit.

    When we make the third commit C, it points back to B:

    A <-B <-C
    

    and so on.

    Drawing these arrows is a pain in text, and not all that useful since these arrows obviously always point backwards. You can't have a commit point forwards to a child that does not exist yet, you can only point backwards to the parent that does. And, these arrow can never change: nothing about any commit can ever change. (If you try to change something, the hash ID changes, because the hash ID is a cryptographic checksum of the contents!) So we just make a connecting line:

    A--B--C--D
    

    To find the latest commit, Git needs a bit of help. This is where branch names enter the picture: a branch name is just a name with an arrow pointing to some commit.

    Unlike the arrows coming out of commits, the arrow coming out of a branch name is not fixed. It changes all the time, as we add new commits. So we draw them in:

    A--B--C--D   <-- master
    

    The branch name master points to the tip (most recent) commit on the branch.

    To make a new branch, in Git, we simply pick any starting commit that we already have—often some existing branch tip like D—and make it the current commit, while also making a new branch name that points to it:

    A--B--C--D   <-- br1 (HEAD), master
    

    We now have two names pointing to commit D, so we need to know which one is "ours". That's why we add HEAD here, so that we know that the branch we are "on" is named br1. Now let's make a new commit E. Git will move the current branch name br1 to point to the new commit. The new commit will point back to the commit we were on, i.e., D. We will need to draw this on a new line:

    A--B--C--D   <-- master
              \
               E   <-- br1 (HEAD)
    

    Let's get back on master and add a new commit there as well, by doing git checkout master, making some changes to some files, and git adding and git commiting them to make F:

    A--B--C--D--F   <-- master (HEAD)
              \
               E   <-- br1
    

    This thing we drew here is the commit graph. This graph is technically a Directed Acyclic Graph or DAG, so it's also called "the DAG". Understanding these Git DAGs is one of the keys to using Git effectively.


    1This ID is actually, currently, a 160-bit number represented in hexadecimal. The ID is found by computing a cryptographic hash over the contents of the object. This guarantees that each one is unique, though with an infinitestimal probability of failure that grows over time. To keep the chance acceptable, it's probably wise not to put more than about 1.7 quadrillion objects into any one Git repository. See How does SHA generate unique codes for big files in git for more.


    Remotes and distributed repositories: git fetch and git push

    What makes Git particularly interesting and modern is the idea that we can distribute repositories. That is, we can have one of these repositories, with its commits and branches, and then throw in a second Git repository, with its own commits and branches. The way Git makes this work internally is the reason for those weird hash IDs in the first place: these IDs are not only unique to your repository, but in fact unique across all shared repositories.

    This means that if you cross-connect two different Gits and tell them to share some commits, each Git can tell whether the other Git already has the commit, or not. If you are getting commits from them, but you already have that particular one, you don't have to get it again. If you don't have it yet, you get it, and now you have it. If you are giving commits to them, the same method works the same way: if you both have the hash ID, you both have the object; if not, one Git gives a copy of the object to the other, and now both have it.

    Because each commit parent link is a hash ID, giving or getting all the commits they or you don't have yet is sufficient. Whoever didn't have the commits, now does. The new DAG in whoever got the commits (and other related objects) is now full and complete.

    This process of transferring commits is Git's fetch and push operations. Running git fetch means "call up some other Git, and get (fetch) commits and other objects from that Git, into my repository." Running git push means "call up some other Git, and give them commits and other objects from (pushed from) my repository."

    Remote-tracking branches

    There's a problem here though, especially on the git fetch side. We noted above that Git finds the latest commits via branch names. When we get some new commits from some other Git, something interesting happens. Consider the graph we drew above:

    A--B--C--D--F   <-- master (HEAD)
              \
               E   <-- br1
    

    and suppose that we git fetch and bring in two new commits G and H, that we draw like this:

               G--H
              /
    A--B--C--D--F   <-- master (HEAD)
              \
               E   <-- br1
    

    How will we have our Git find commit H? If they were just sequential letters like this, our Git could, say, remember that there are eight commits, and go find H. But they're not—they're incomprehensible hash IDs. We use our own branch names, like master and br1, to remember the hash IDs of F and E respectively.

    This is where remote-tracking branch names enter the picture. (This term, remote-tracking branch, is not a superb name in my opinion, but it's what we have and it suffices.)

    For their Git to have commit H, they must have some branch name—probably their master—pointing to commit H. If we have our Git remember their Git's branch names, but under some other name, we can have our Git locate H that way. So here's what we get:

               G--H   <-- origin/master
              /
    A--B--C--D--F   <-- master (HEAD)
              \
               E   <-- br1
    

    The name origin/master, where we prefix their branch name with origin/, keeps track of "where master was on that other Git".

    The name origin comes from what Git calls a remote. The standard single remote name for any other Git is origin, because we usually get this all set up by doing git clone. We clone some other existing Git repository, getting all of its commits and all of its branches. We then rename all its branches, so that its master is our origin/master and its br1 is our origin/br1.

    (This remote, by the way, is primarily a short name for the URL. But it's also the prefix of each of these remote-tracking branches.)

    While you can git checkout a remote-tracking branch name (try git checkout origin/master for instance), this immediately results in what Git calls a detached HEAD. In this case, the name HEAD no longer refers to any branch. What we get looks like this:

               G--H   <-- HEAD, origin/master
              /
    A--B--C--D--F   <-- master
              \
               E   <-- br1
    

    The name HEAD now points directly to commit H, instead of containing the name of a branch that points to commit H. Our master points to commit F and our br1 points to E; we don't have any branch name pointing to H. We only have one of these remote-tracking branch names, and a remote-tracking branch is not a branch: it's just a name.2


    2Worse, Git has a verb, tracking, that means something different from all of these. You might see now why I think "remote-tracking branch" is not a superb name. How many times can we use the words "remote", "tracking", and "branch", in different ways to mean different things, before we get all confused? :-)


    What git checkout does

    We already mentioned that we can use git checkout to check out a commit or a branch. These are the main two things that it does: check out a commit, or check out a branch.

    Which one does it do? Well, it "prefers" branches. If you:

    git checkout master
    

    then, since master is a branch name, it checks out the branch name master, attaching HEAD to master. Likewise for br1: that's a branch name, so it can be checked out as a branch.

    If you git checkout , though, it checks out the specific commit, and goes into this "detached HEAD" mode. The same happens if you try to check out a tag name, or a remote-tracking branch name. Neither of those is a branch name, so you can't be "on" those as branches, so it just checks out the commit.

    When git checkout checks out a commit, it re-arranges the work-tree (and Git's index, which we mentioned before and are still not going to explain here) to match that commit. The same is actually true of checking out a branch. When we git checkout master, we get on branch master, as git status will say; but this has the effect of filling in the index and work-tree from the tip commit of that branch.

    Being on a branch means that when we make a new commit, Git will make the branch name point to the new commit. We saw how master and br1 grew new commits E and F above: this happens because we are on those branches when we git commit.

    Merging: git merge

    Whenever we are on some branch, and have a clean index and work-tree (use git status to check—use git status often!), we can ask Git to merge our commit with some other commit, to make a new merge commit.3

    To perform this merge action, Git must find the merge base. If you have used other VCSes, some of them require that you manually find the merge base. Git uses the commit graph—the DAG—to find the merge base for you.

    Let's continue with our example, where we've brought in two commits that we name, in our repository, via origin/master. Let's get on our master, which is our commit F. I'm going to re-draw the graph and leave out br1 entirely here because we don't need it now:

    A--B--C--D--F   <-- master (HEAD)
              \
               G--H   <-- origin/master
    

    Now that we're on branch master, and git status says nothing to commit, working tree clean, we'll run:

    git merge origin/master
    

    This tells our Git to find commit H and merge it with our current commit F (which our Git finds through our HEAD). Git searches through the commit graph to find the first commit that is reachable from both H and F, working backwards along the parent arrows. We can see, just by looking, that this is commit D.

    Git then, in effect, runs:

    git diff D F    # to find out what we changed since D
    git diff D H    # to find out what they changed
    

    The merge code then does its best to combine those changes. If all goes well, it writes the combined changes into the index and work-tree, and then runs git commit to make a new merge commit.

    This merge commit goes on our master, as usual, but it's a bit odd in that it has two parents. The first parent is our previous branch tip commit F. The second parent is the commit we just merged, which is commit H. The result looks like this, graph-wise:

    A--B--C--D--F---I   <-- master (HEAD)
              \    /
               G--H   <-- origin/master
    

    Our master now points to this new merge commit I, and I points back to both F and H.


    3This is yet another example of Git overloading words: we merge (as a verb) two commits, and then we make a merge commit (merge as an adjective), which we call a merge (merge as a noun). It's important to keep in mind that merge-as-a-verb is an action, while merge-as-a-noun (or adjective) refers to a merge commit. We can get Git to do the merge-as-a-verb without creating a merge commit, if we want to. But that, too, is a topic for later.


    Note that git merge doesn't always merge

    Sometimes git merge does not have to merge things. For instance, suppose we never made commit F at all? Suppose we started with this instead:

    A--B--C--D   <-- master (HEAD)
              \
               G--H   <-- origin/master
    

    If we now run git merge origin/master, Git can see that the merge base commit D is the current commit. That means Git does not have to do any work—it does not have to merge-as-a-verb at all. Instead, Git can just git checkout commit H, and also make our name master point to commit H:

    A--B--C--D
              \
               G--H   <-- master (HEAD), origin/master
    

    and now we don't need the kink in the graph:

    A--B--C--D--G--H   <-- master (HEAD), origin/master
    

    In another fit of silly naming syndrome, Git calls this a fast-forward merge, even though there is no merging involved (nor any tape-recording devices that could be spun forward at high speed, though by now we are all used to "fast-forwarding" through digital movies on Netflix or whatever).

    About git pull (don't use it)

    The git pull command is meant to be a convenience short-cut. And it is convenient sometimes, but it's also a trap.

    For a long time, in old versions of Git, there were numerous bugs in git pull that would destroy your work on some occasions. I believe these are all fixed, so that's not really a problem if you have a modern Git. But it has several other drawbacks, such as hiding the fact that all it does is run git fetch followed by a second command, usually git merge.

    If you use git pull, you don't learn what git fetch does, and don't realize you are running git merge. Everything seems excessively magic. Moreover, if the git merge step fails—and eventually it will—you may be quite helpless: you won't know that you are in the middle of a conflicted merge, much less how to read up on what to do about that.

    Last, while it's minor, the syntax for git pull is weird. This is because it actually predates the invention of remotes and remote-tracking branch names. (In fact, that's why it seems like pull, not fetch, should be the opposite of push: originally, it was!)

    Instead of:

    git fetch origin
    git merge origin/master
    

    (which makes sense), you run:

    git pull origin master
    

    Why is this origin master and not origin/master? Or, if you're vaguely aware that there's a git fetch step involved, why isn't it git pull origin origin/master? Why do we git merge origin/master but git pull origin master? The answers all have to do with the ancient history of Git, and none of them are really all that useful—except that they explain why git fetch origin master br1 is a really bad idea (don't do it!).

    If you avoid git pull entirely (and remember that it's just git fetch followed by a second Git command), you will learn git fetch and the other Git commands. Once you really understand them, you can start using git pull if you find it more convenient: you'll know, when it has gone wrong, what to do. But until then, I recommend avoiding it.

提交回复
热议问题