Git blind spot between staging and remote

时光毁灭记忆、已成空白 提交于 2019-12-12 02:44:16

问题


I'm relatively new to git having come from a VSS / TFS / SVN background. I'm using the git plugin in Visual Studio 2015.

So I clone a repository and make some changes to a code file. Then I look at the changes tab (which I presume does a git diff in the background) and I see the changes I made. All fine and dandy.

Then I commit the changes to my local repo, after which I make some more changes to a local file, view the changes, and commit again.

My problem is that there's now a blind spot between my local repo and the remote one - I've lost visibility of the differences between local and remote. This has been confusing me since I started working with git.

Am I doing it all wrong... afaik "git diff" shows the changes between my local working code base and local repo. How do I easily see the difference between the changes that have been accumulating in my local repo and the code on the remote repository?


回答1:


You will want to diff commits. In particular, you will want to diff the current commit that is the tip-most commit that is in another repository against the tip-most commit in your own repository. But in the most general case, you cannot do that until you copy that commit—the other repository's commit—to your repository. This is because you are always working in your own repository, except for the brief periods when you connect your repository to a second Git with a second repository.

This might be best illustrated visually (which requires showing a graphical representation of branching, which—at least in terms of physical implementation—is very different in Git than it is in SVN). Consider this drawing:

             o--o--U   <-- commits you made
            /
...--o--o--B    [shared with another repository at the start]
            \
             o--T   <-- commits they made when you weren't looking

I gave letters to three of the "most interesting" commits (the others are just round o nodes). Newer commits are towards the right. The commits you have made are on the top line. Commits "they" made, that you have to bring in before you can do comparisons, are on the bottom line. Commits that both you and they had in common before you started all this are on the middle line.

I believe you are asking how you compare the contents of U (for Us or oUrs) to the contents of T (for Them). The way to do that is with git diff plus the hash IDs of the two commits U and T. Note, however, that you can compare any pair of commits, such as B (for Base) vs U and B vs T, as well. When you merge your work with their work, Git will automatically compare B-vs-U and B-vs-T.

To get the bottom row of commits you must run git fetch.

If you are coming from SVN, many of these concepts themselves will be unfamiliar.

Git vs SVN: a 30,000 foot overview

I will expand a bit on what I said in comments, but still sticking to a high level view—one that, as much as possible, avoids mentioning branches, which SVN implements very differently. For a slightly deeper rapid introduction to Git branches, see https://stackoverflow.com/a/44081446/1256452 (an answer to need clarification on pulling git branches).

One repository vs many repositories

In SVN, there is only one actual repository. That repository, which holds every commit ever, is "far away" on some server somewhere, rather than on your computer. To work on something, you grab a bunch of code out of the repository. This puts the code on your own machine, where you can work on it. You do some work and when you are ready, you tell your machine:

  • Show me what I have, compared to what's on the server
  • Send what I have back to the server to make a commit

Git usage is very different. There is not just one repository. There is one repository per user, or perhaps even more-than-one per user (you can make as many clones as you like, and each one is a repository in its own right). Your repository has every commit ever, and so does every other copy (every "clone"). Of course, each clone starts out the same as the original, but they do drift apart over time. How fast they drift apart depends on how active each clone is.

Because separate repository clones exist, you need something SVN doesn't: a way to synchronize two repositories. In Git, these are called push and fetch. (You might think they should be push and pull, and in Mercurial, they are, but Git kind of over-defined pull initially and had to back up and introduce fetch. Don't use Git's pull right away, it mixes together two different ideas and will lead you astray.)

Remember that whenever you are using these operations, there are two Gits involved, with two repositories. Each of these repositories is a complete repository in and of itself, with its own full set of commits, and ways to name those commits (Git calls these "branches" and "tags" and the purposes are similar, but these are very different from SVN's—you'll be best served by thinking, initially, about commits rather than "branches").

When you first make your own clone, or do a git fetch to get your clone updated from its origin—which is cleverly named origin—what your Git does is rename their Git's commits. Since you have just cloned, or just run git fetch, your repository is now the same as, or a superset of, theirs. You now have everything they do, plus perhaps a bit more.

Hence, when you want to see what you have that they don't—particularly, how your most recent commit compares to theirs—you simply do:

git fetch origin    # get me their latest things

and then:

git diff <their-latest-commit> <my-latest-commit>

to compare their highest revision to yours.

Revision numbering: why Git hash IDs are weird

There is a separate problem here, which I won't get into deeply, but will touch on. In SVN, "highest revision" is easy to tell, because there's only one repository and revisions are numbered sequentially. If your revision is number 128 and theirs is number 127, obviously yours is later than theirs, but if yours is 128 and theirs is 129, obviously theirs is later than yours. But here in Git, when you git fetch, you have two repositories. What if when you cloned, there were 127 commits, but both you and they have added one commit each? Now you both have 128 commits. You then cross-connect their Git to your Git and yank in their new commits, and now you have 129 commits and they have 128 commits. What happens?

The first and most important thing that happens is that commits aren't numbered sequentially at all, so that your Git and their Git don't have to fight over who gets to call this "revision 128". Instead, Git commit IDs are big, ugly, incomprehensible hash IDs. The exact hash ID is something Git assigns when you make the commit. Git guarantees that your hash IDs for your new commits never collide with their hash IDs for their new commits.

This is what allows you to cross-connect the two Gits in the first place: initially, your Git and their Git exchange just their latest hash IDs. These, plus algorithmic tricks using properties of Directed Acyclic Graphs, mean that your Git and their Git can quickly determine all the commits (and other Git objects) that need to be copied to complete a fetch or push. Even if the initial clone of a repository takes many minutes or hours, a fetch or push can usually complete in a few seconds at most (though this depends on the amount of data that you must transfer, how fast your two computers are, and how fast the network is between them).

In this particular example (where you both added commit "128"), these new commits go "in parallel" when compared to older ones:

             o   <-- your 128th commit
            /
...--o--o--o   <-- 127 commits in a row here
            \
             o   <-- their 128th commit

This forking of commit streams is a kind of branch and leads to what Git calls "branches", which as I said I am not going to get into here. But essentially, you compare your 128th (or higher) commit to their 128th (or higher) commit, and/or to the merge base commit, which is the last commit that you both had in common before you started to drift apart by adding new commits.

Public vs private commits, or, a few words about making all this function correctly

To make all this work, Git depends on the idea that whatever happens with these repository clones, you only add new commits.

This is sort of a lie: the word depends here is too strong. When you cross-connnect two Git repositories, the one that acquires new commits just gets the new ones. If the user of the other Git has thrown away some commits, then one of two things holds:

  • You already had the commits they threw out. You still have them. You have merely added new commits to your technological distinctiveness.

  • You never had the commits they threw out. You still don't have them. You have merely added their remaining new commits to your technological distinctiveness.

The principle here should be clear, or at least clear-ish, though: your Git is not throwing anything away when you connect your Git to their Git. Your Git is only adding new things.

The same normally goes for a git push operation: your Git will give their Git new commits, not take some away. It's very difficult (and somewhat error-prone) to take things away. It can be done, via what Git calls a force push, but the mechanism is somewhat involved and requires getting down into the details of Git's branches and other of what Git calls references. So in general, you should endeavor to only add new commits—or at least, only add new commits once you publish a commit by giving it to another Git.

When you first add new commits to your repository, these new commits are only in your repository. At this point, you can discard them,1 or copy-and-replace them,2 or whatever, with great freedom. But once you share those commits with another Git, that other Git has those commits, which it knows by their magic hash IDs. It's now very difficult to retract them.

Unless you have your own server and allow others to git fetch from you, in general, you publish commits using git push. So if you have not yet pushed a commit—if you have only used git fetch—you are free to rework it. (Mercurial keeps track of whether a commit has been published, using something called a commit phase. Git doesn't, but Git's reference system—what Git calls remote-tracking branches, which is a terrible phrase—can help you out here. But that, again, gets into branches, which I am trying to avoid.)

Note that it's easy to undo a commit without taking it back: just add a new commit that reverses the effect of the previous commit. Git calls this a revert; Mercurial calls it a backout. Either way, though, you're adding new commits, which makes everything work smoothly. So once you have published commits, you can revert them, and there are no complicated "everyone must agree to retract" issues with that. The only drawback is that now you have a mistake and its correction, published forever for the world to see. :-)


1To discard commits, you can use git reset—though I will note here that git reset is a bit of a Frankenstein monster of a command made of three different parts. It doesn't necessarily discard commits: you can use it to discard uncommitted work instead, or at the same time. In fact, you can even use it to resurrect commits!

2To replace an entire chain of commits with an entire new chain—the "chain" can be as short as you like, including just one commit long—use git rebase. Rebase is a bit complicated technically; it's probably wiser to start with git merge, since each copied commit is technically made by merging. Note that those incomprehensible hash IDs change in the new copies, since the new commits are different from the originals. Merging avoids the copy-and-change-ID thing, by (normally) creating a new merge commit.



来源:https://stackoverflow.com/questions/44132037/git-blind-spot-between-staging-and-remote

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!