git: how get an old tagged version into master, without losing history?

China☆狼群 提交于 2021-01-29 18:20:43

问题


This is a very small project with only a master branch.

I (lightweight) tagged a version of the source which was in production and pushed tag to origin

Then I committed some changes to the master (which triggers a build onto our dev system so we can test it) and pushed these to origin.

Now I want master to contain the tagged version, something like "revert/reset", but I don't want to lose the changes I have made which may be useful at some point.

This answer: How do I revert master branch to a tag in git?

Is to do the following:

git checkout master
git reset --hard tag_ABC
git push --force origin master

I have no idea what this does, but it looks dangerous/drastic, and I am looking for a simpler (less likely to go wrong) solution.

Presumably I need to something like checkout the master branch, and merge in the tag, or checkout the tagged version, and merge in master?

E.g.

$ git checkout master
$ git pull
$ git merge mytag
$ git push

Or would this get confused because the changes I want to backout are newer?

I have seen you can set the master branch "position tag" to a commit. So I am guessing I just need to do

git reset XXX

Where XXX is the commit number of the commit from the tag. If this method works, how do I get the commit number of a tag (git hist or git history does not work on the mac)? If this is so easy, why the force and hard stuff?

If I checkout my tag, and do git status, it says "HEAD detached at mytag"

If I don't follow a cook book recipe, it usually ends in disaster, so hoping someone has done this before.

UPDATE

I got several replies, which was great, but none have a complete recipe unfortunately. For lack of better solution, I did this:

  1. checked out old tagged version.
  2. cut and paste the contents of the files I changed into onenote.
  3. checkout the head.
  4. opened the modified files in an editor, and pasted the contents from onenote.
  5. committed the changes to master.

I am sure there are better ways.


回答1:


To keep a pointer to the current head commit of your master branch : just create a branch there :

git branch new/features master
git push origin new/features

after that, you can reset --hard and push -f master all you want.




回答2:


This is a very small project with only a master branch.

This is, perhaps, the real error here. 😀 Branches in Git are cheap and good, and you should start using them. See LeGEC's answer for a recipe (but do read through all of this for caveats and enlightenment!).

One thing to realize is that branches in Git don't actually mean anything. That is, there's nothing special about master except that people start with it.1 This is why they're so cheap. The only real meaning to any branch name is whatever meaning you give it.

If I don't follow a cook book recipe, it usually ends in disaster ...

What this means is that you should learn what Git really does here. It's only a little bit complicated!


1Well, the fact that Git uses it by default, as the starting name, is something you could call "special". Note that this is about to change, though—GitHub in particular are reported to be switching to main, and Git is growing a feature in which you can set the name (to main like GitHub, for instance) in your system or per-user configuration (this is more or less what GitHub plan to use to set their new default). There are a few other minor quirks with this, and it has taken a few rounds of review to find and tweak all the places where something funky happens due to a built-in comparison of the six letter sequence master. But other than that there's nothing special about master.


Git is all about commits

As a casual user of Git, you probably think of Git as being about branches and/or files. That's where you get led astray, and why things end in disaster: Git isn't about files, and is not about branches either. Git stores files, and uses branch names, but it stores the files in commits, and uses branch names to find the commits. In the end, everything is about the commits themselves.

There are some things to know about commits:

  • Every commit has a unique number. The numbers aren't simple counting numbers though: they don't start with commit #1 and count up to #2, #3, and so on. Instead, each commit gets a random-looking—but not actually random at all—big ugly hash ID like 385c171a018f2747b329bcfa6be8eda1709e5abd.

    These numbers have to be so big and ugly because that number means that commit, from now on. None of your commits can use that number.2 A commit's number—its hash ID—is actually a cryptographic checksum of the contents of that commit. This means that no commit, once made, can ever be changed, which is hugely consequential.

    Git can look up a commit—or any internal Git object, but we'll just worry about commits here—by the big ugly number. The Git repository is mostly just a big key-value database with the hash ID "numbers" being the keys (well, plus a second database of names-to-hash-IDs, which we'll get to in a bit).

  • Each commit stores two things:

    • It stores a full snapshot of every file that Git knows about (or knew about at the time you, or whoever, made the commit, that is). The files inside a commit are stored in a compressed and de-duplicated form. Since the files are all read-only—every part of a commit is read-only, as we just noted—it's OK to share them with other commits, with this de-duplication. So it doesn't actually hurt to store the same file a thousand times in a thousand commits: they all just re-use the one version of that file. It's only when files change that a new commit has to store a new version.

    • It stores some metadata, or information about the commit itself. That includes the name and email address of whoever makes the commit, for instance. There's a date-and-time-stamp—actually two of these—in each commit as well, and your log message goes here. Most important for Git itself, though, each commit stores, in this metadata, the big ugly hash ID of the previous commit.

It's this last bit that makes everything work—or break, when you have a disaster. 😀 To understand what's going on here, let's draw a simple, and very small, Git repository, in which we have just three commits. Because actual hash IDs are too big and ugly, let's call these commits A, B, and C, and draw them like this:

A <-B <-C

Commit C (whatever its hash ID really is) is the last commit, and it stores a bunch of files and some metadata, all of which are frozen for all time now. Inside commit C's metadata, Git has stored B's hash ID. So from C, Git can work backwards one hop, to find B. Meanwhile B has files and metadata, and in B's metadata, Git stored A's actual hash ID. So from B, Git can step back to commit A.

Git calls these backwards-pointing links parents. The parent of the last commit C is B, and the parent of B is A. As you can see, Git actually works backwards. We start with the last commit—which so far, is C—and go back one commit at a time. Commit A, being the very first commit, is special in exactly one way: its metadata doesn't list a previous commit. It has no parent (it's an orphan, sort of). That's how Git knows that it can stop going backwards. But there is one hitch here: how do we find the actual hash ID of commit C?


2Technically, your Git could re-use that number, as long as you never introduce your Git to a Git holding the repository for Git itself. For much more about this, see How does the newly found SHA-1 collision affect Git?


A branch name stores one commit hash ID

This is where branch names like master come in. Each name stores one hash ID. The hash ID inside a branch name is, by definition, the last commit in that chain. So we can re-draw the above as:

A--B--C   <-- master

The name master, which is easy for a human to remember, holds the actual hash ID of commit C. We'll check this commit out, and that will be our current commit, with master being our current branch. So now that we're on master, if we add a new commit—by the usual means that we haven't described here—what Git will do is:

  • package up a new snapshot;
  • add some metadata: name, email address, and so on;
  • include in that metadata, the hash ID of the current commit C; and
  • write all that out as a new commit, which will acquire a new unique big ugly hash ID, but we'll just call that D.

Let's draw that:

A--B--C   <-- master
       \
        D

As you can see, D's parent is C. Now git commit performs its special trick: the last step of git commit is to write D's new hash ID into the name master. The result is:

A--B--C
       \
        D   <-- master

which we can just draw out as:

A--B--C--D   <-- master

Using more than one branch name

Let's go back to our three-commit repository, before we make D:

A--B--C   <-- master

Now, before we do make D, let's create a new branch name, dev for develop. A Git branch name must select some commit, so which of the three commits should we select? Well, the latest one makes a lot of sense, so let's use C, the commit we're using through the name master:

A--B--C   <-- master, dev

Now all three commits are on both branches. But now we have a problem with our drawing: which name are we using? We have two names! We need a way to tell which one we're actually using. Right now it's not super-important, because both names hold the hash ID of commit C, but we're about to make a new commit D.

Let's pick the name dev to use, with git checkout dev, and draw it like this:

A--B--C   <-- master, dev (HEAD)

Here, we've used the special name HEAD, in all uppercase, and attached it to one of the branch names. That tells us—and Git—which name we're using.

Now let's make commit D while we're using this dev name. Git will write out a new commit as before, but this time, the name it updates is dev, not master. So we end up with:

A--B--C   <-- master
       \
        D   <-- dev (HEAD)

New commit D is now the last commit in the dev branch. Commits A-B-C are now on both branches, with commit C being the last commit in the master branch.

That's all there is to it! Well, OK, almost all. There are several more wrinkles that will come up in a moment. But that's what branch names are all about: A branch name just holds the hash ID of the last commit in the chain. Git will start here, and work backwards whenever it needs to.

Short sidebar: the index and your work-tree

To keep this answer shorter, I won't go into a lot of detail here, but think about the fact that every commit is frozen for all time. The files inside each commit are in a special Git-only de-duplicated format, that only Git can read. How can these files actually be any use? To be of use, files have to be readable by other programs, and usually at least a few of them need to be writable too.

All version control systems have this problem, and all of them use similar approaches: there's the version controlled "file", frozen for all time, and then there's a separate file that's actually usable. The usable files go in a work area. Git calls this work area your working tree or work-tree.

This means that the files you see and work with, when you're working with a Git repository, are not actually in the repository at all. They were copied out of the repository (by git checkout or git switch) so that you could use them, but now that they're out, they are literally outside the repository. Those aren't Git's files: they're yours.

Where Git departs from most version control systems, though, is that Git keeps a third copy—well, sort of a copy—of each file. This extra copy sits "between" the frozen file, in Git's commit, and the usable file in your work-tree. It's in Git's frozen and de-duplicated format, but it's not actually frozen, because it's not in a commit. This extra copy is in what Git calls, variously, the index, or the staging area, or sometimes—rarely these days—the cache. Because it's pre-de-duplicated, it's not really a copy (and what's inside Git's index is another one of those big ugly hash IDs, for an internal blob object, rather than the actual file data directly). But thinking of it as a copy works well.

When you run git add on some file you've changed in your work-tree, what you are really doing is telling Git: make the index copy of this file match the work-tree copy. Git will remove and replace the de-duplicated frozen-format file, making a new copy (but already de-duplicated, if it matches any previous version) of the file, ready to be committed.

Because the index holds the de-duplicated, ready-to-commit copies of each file, a good way to think of Git's index is that it holds your proposed next commit. Running git add is your way of telling Git: Change my proposed next commit now, using the updates I've made in my work-tree.

Tags

Now that we have a good way to draw what's going on with commits, let's draw what a tag does. A tag name is a whole lot like a branch name: it holds one hash ID. In this case, that's a commit hash ID.

There are several key differences between branch names and tag names:

  • Branch names are forced to hold only commit hash IDs. Tag names can hold other kinds of hash IDs, and that's what an annotated tag is about. You get an internal Git object that holds extra information—the annotation—and then holds a hash ID: normally, a commit hash ID. So the annotated tag gets you a commit, but lets you add information first. You mentioned that you're using a lightweight tag, and those just hold commit hash IDs directly, so that's what I will draw here.

  • Branch names move, as we saw above when we made new commit D. Whatever branch name you have as your attached-HEAD, that's the name that moves. Tag names don't move.3

  • Branch and tag names are in different namespaces. We won't go into any detail here, but tag names are meant to be more "global" than branch names: every Git repository gets its own branch names, but in general, when you connect two clone and have them share, they tend to share their tag names so that everyone has the same ones.

Since tag names don't move, let's draw that. We'll start with this:

...--G--H   <-- master (HEAD)

and then we'll add a tag name, tag:ABC for instance, like this:

...--G--H   <-- master (HEAD)
        ^
        |
     tag:ABC

If we now create a new commit, we'll get:

...--G--H--I   <-- master (HEAD)
        ^
        |
     tag:ABC

Note that we could draw this like this:

          I   <-- master (HEAD)
         /
...--G--H   <-- tag:ABC

which emphasizes that tag names and branch names are a whole lot alike. You could have used a branch name, where you actually used a tag name. The distinctions—that tag names don't move, but branches do, and so on—are mostly for human use. Git itself doesn't really care: Git cares about the hash IDs.


3You can move a tag. There are several ways to do that, with the most obvious being: delete the tag, then create one that's spelled the same but that selects a different commit. This is often a bad idea, and the reason is that both humans and Git repositories don't expect tags to move. Anyone who grabbed the "wrong" tag earlier is likely to hang on to this wrong value: you'll have to convince them to delete-and-re-create, or otherwise move, their copy of the tag, too.


Detached HEAD mode

You noticed that when you run git checkout tag_ABC, or whatever the actual spelling is for your ABC tag, you wind up in detached HEAD mode. That's because HEAD itself can only be attached to a branch name.

Branch names move, and the method by which they most often move is by having HEAD attached to them. Tag names are not supposed to move (see footnote 3 again), and to enforce that, Git won't attach HEAD to a tag name.

In general, you can also check out any historic commit, to view or use it in some way. For instance, suppose you decide you want to look at commit G for a while, or build it, or whatever. You can just direct Git to check out that commit by its raw hash ID—as seen in git log output, for instance—and you'll get this:

         I   <-- master
        /
       H   <-- tag:ABC   # drawn on right to save space
      /
...--G   <-- HEAD

A "detached HEAD" just means that the special name HEAD points directly to some commit. So if you now git checkout ABC, you get:

          I   <-- master
         /
...--G--H   <-- HEAD
        ^
        |
     tag:ABC

Your index and work-tree are full of files from commit H. Your HEAD identifies commit H, as does your tag. Meanwhile your name master still identifies commit I.

To get out of detached HEAD mode, you simply git checkout master or git switch master. This re-attaches HEAD to the branch name, and extracts the commit identified by the branch name—commit I in our drawings here—into Git's index and your work-tree, so that you see the files from that version.

Drastic? Perhaps

The other answer you linked includes:

git checkout master
git reset --hard tag_ABC
git push --force origin master

I have no idea what this does, but it looks dangerous/drastic ...

Dangerous, yes: in particular that --hard tells git reset to remove all seat belts and disable all the air bags, as it were, and the --force is similar. It's perhaps less drastic than it looks though.

The git reset command is terribly complicated, but we'll just look at the --hard mode here.4 What this does is actually three things:

  • First, it moves the current branch name. For this to have any effect, HEAD has to be attached to a branch name. That's why we have the git checkout master.

  • Then, it resets Git's index, so that the proposed next commit matches the commit you just moved to.

  • Last, it resets your work-tree, so that the files you see are those from the commit you just moved to. It does this without asking whether some of those files have stuff you never saved anywhere, and since those files are not in Git, any data that get overwritten, Git can't recover, either. That's the most dangerous or drastic part, right there.

The commit you choose—tag_ABC here—is the one that the name now selects, so after this git reset --hard, we have this picture:

          I   ???
         /
...--G--H   <-- master (HEAD)
        ^
        |
     tag:ABC

You might wonder: What happened to commit I? The answer is: Nothing at all. It's still there. But how will you find it?

If you jotted down the commit number—the hash ID—before your git reset, you could find commit I that way. Git also has various "recover from a mistake" logs and commands that will let you find commit I again. These keep track of the commit for at least another 30 days, by default. So the commit is still there. You can get it back!

The git push --force is actually more drastic, but to see why, we need to talk about multiple Git repositories, and this part really does get a little complicated.


4I view this much as the same as git checkout was before git checkout got split into git switch and git restore: it has too many modes. The new split-up commands are simpler because each one only does a few things. Reset probably should be split up as well.


Other Git repositories

We say that Git is a distributed version control system (DVCS). What this means is perhaps unclear. It might be better to refer to it as a replicated VCS: it's not distributed in the way that distributed computing is, for instance. In short, though, the way this works is that different Git repositories will connect to each other, and having connected, can now share—replicate—commits.

Since each commit has a unique number, the two Gits can decide whether one has a commit that the other has, just by passing around the numbers. That's what the main phase of a git fetch or git push is all about: one of the two Git repositories has some commits that the other one maybe doesn't. The sending Git offers the receiving Git the hash IDs. The receiving Git looks in its big database of Git objects, and tells the sender: please send that or no thanks, I already have it.

Each commit, of course, remembers the hash ID of its parent commit. The sending Git is obligated to offer the parent (or for merge commits, parents plural) of each commit it sends. So if you have three new commits that they don't, all in a row, and you tell your Git to send the last of these three, your Git will actually send all three.

Having received some new commits, though, the receiving Git now needs some way to find these commits. We already noted that one way we find commits is with branch names. So the receiving Git could set some branch name(s) to remember any new commits.

The git fetch and git push commands differ here in the way they work: when you run git fetch, your Git is the one receiving, and their Git is the one sending. Your Git doesn't set your branch names. Instead, your Git sets some other names. This is fancier (and nicer in many ways) than git push, but we'll skip right over this and consider git push instead.

When you run git push, your Git is the sender and their Git is the receiver. You send any new commits that you have, that they don't, that they will need. At the end of this process, your Git normally now sends a polite request: Now, if it's OK, please set your branch name ______ to ______. Let me know if that was OK. Your Git fills in the first blank with a branch name, and the second one with a hash ID.

The branch name your Git asks them to set comes from your git push command. If you run:

git push origin HEAD:master

the master here means their master branch. The HEAD here means that the commit you'll ask them to set is whatever commit is your current commit. (The origin part is the way you specify the Git you are sending to.)

When you use:

git push origin master

you're really saying master:master, i.e., you want your Git to find your master commit—the last one on the chain ending at your master—and send that commit, and then ask them to set their master.

So, suppose you and they both start out with:

...--G--H   <-- master

Your Git and their Git are in sync. But now you create a new commit I on your own master (whether or not you create a tag). You now have:

...--G--H--I   <-- master

If you run git push origin master, your Git calls up their Git and offers commit I. They don't have that one, so they say please send it. Your Git now offers H, because I's parent is H; theirs says no thanks, I have that already. Your Git now knows that they have G and everything earlier too, because H is the last commit in a chain, and they must have the entire chain.5 All of this fancy footwork essentially allows your Git to send, not the entire I commit, but only the parts of the I commit that they don't already have. It's remarkably efficient, and it all comes about by exchanging just two hash IDs.

Anyway, your Git sends over commit I—or just enough to let them reconstruct it—and they now have I. Now your Git asks their Git to please, if it's OK, have them set their master to remember commit I.

They will say that this is OK, and the reason they will say that is that this just adds to their collection. Starting from I, they can go back to H, so they won't lose H, nor G, nor anything earlier.

Note that when you git push a tag, your Git ends the conversation with a polite request that they create or update their tag of the same name. Except for the fact that they should not move a tag, but should move a branch name if it just adds on, this is all pretty much the same.


5This papers over the way shallow repositories work, but let's not worry about that now.


How, when, and why git push --force is dangerous

Suppose you've sent commit I to the other Git, and now decide to retract it, by using git reset as we saw above. You have, in your repository:

          I   <-- new/features
         /
...--G--H   <-- master (HEAD)

because you cleverly saved the hash ID of commit I in a new branch (see LeGEC's answer) before you did the git reset. Moreover, you also did a git push origin new/features, which had your Git call up their Git, offer them commit I—which they already have—and ask them to set their new/features to remember commit I. They said OK to that too. So right at that moment, they have:

...--G--H--I   <-- master, new/features

But we just said that they're taking git push commands. What if some third user has a third Git repository?

Suppose this third user has grabbed commit I and has used that to create a new commit J. This third user—let's call him Bob—made his repository have:

...--G--H--I--J   <-- master (HEAD)

He then runs git push origin master to send commit J to the repository you're about to git push --force. They accept commit J and add it to their master.

They now have:

...--G--H--I   <-- new/features
            \
             J   <-- master

Bob thinks: Great, my work is done and for some reason, Bob removes his entire repository.6 Commit J is, after all, safe somewhere that's all backed up and everything, maybe on GitHub or whatever.7

Now you come along and offer the second repository commit H, which they already have, then ask them to set their master to point to H. By default, they will say no, and the reason is that this causes their master to drop commits I and J.

You, of course, know that they already have commit I, and you want them to drop it. So you use git push --force. This changes the last operation from Please, if it's OK to Do this now! I command it! If they obey this command—that's up to them, but usually they're set up to obey—they will dutifully change their master to point to H:

...--G--H   <-- master
         \
          I   <-- new/features
           \
            J   ???

In your Git repository, above, we noted that there are some ways for you to find commit I if you forgot to save its hash ID somewhere first. These methods rely on what Git calls reflogs. Server repositories normally have reflogs disabled, which means they don't have a way to find commit J any more.

Without a way to find commit J, they may quickly remove commit J entirely. Their repository drops commit J. Bob had commit J, but we just said Bob removed his repository too.

What happens here, then, is that Bob's commit J is lost, perhaps forever. If Bob keeps his repository, Bob still has his commit, and can restore it to this shared Git repository (on GitHub, or wherever it might be).


6This is probably a mistake. 😀

7It Ain’t What You Don’t Know That Gets You Into Trouble. It’s What You Know for Sure That Just Ain’t So.


Is git push --force really dangerous?

Well, maybe: if we know for sure there's no Bob, or that Bob is careful to keep his repository, Bob can restore the lost commit. As someone who has used shared repositories like this (and occasionally taken on the Bob role but without having removed the repository), I will say that being Repository Janitor is not all that much fun. As an occasional rescue, sure, it's OK. Just don't make me do it all the time. 😀

There is a less-dangerous alternative though. Instead of git push --force, consider using git push --force-with-lease. This rather odd name really means that the last request-or-command—please, if it's OK, set _____ to _____ or set _____ to _____!—changes to: I think your _____ is set to _____. If so, change it to _____. In any case, let me know. Your Git fills in all of these blanks:

  • The branch name comes from your git push remote mine:theirs command. The theirs after the colon—or the one name, if you omit the colon, that provides everything—is the branch name you ask their Git to set.

  • The I think yours is _____ blank gets filled in from your own Git's remote-tracking name. For instance, if you're pushing to master on origin, this is filled in from your own origin/master. You can inspect this value (with git log, typically) before you start the git push. That way you know exactly what commits you're going to ask them to throw away.

  • The set it to _____ blank gets filled in from the hash ID of the mine side of the colon, or from your branch name if you use just the one name for everything.

So git push --force-with-lease origin master means call up origin, then ask them to forcibly set their master, but only if it matches what I can see in my origin/master right now. So you can check before you force-push. If the force-push fails because your check was wrong, that means Bob (or whoever) managed to sneak a git push in between, and you'd best pick up Bob's new commit and figure out what to do about that, before you go force-pushing again.




回答3:


A short answer, which should do the job:

git checkout -b my_tagged_branch tagname

checkout -b creates a new branch and checks it out. From the docs here

$ git checkout v2.0  # or $ git checkout master^^

   HEAD (refers to commit 'b')
    |
    v a---b---c---d  branch 'master' (refers to commit 'd')
    ^
    |   tag 'v2.0' (refers to commit 'b')

Notice that regardless of which checkout command we use, HEAD now refers directly to commit b. This is known as being in detached HEAD state. It means simply that HEAD refers to a specific commit, as opposed to referring to a named branch.

You can then work on my_tagged_branch, commit. If necessary, you checkout master again and then git merge my_tagged_branch.
If you work with a remote, don't forget to push, if you like to see that workflow later on use git merge --no-ff my_tagged_branch (the result is of course the same, just check with git log --graph --oneline.

For details see @toreks answer.



来源:https://stackoverflow.com/questions/63982737/git-how-get-an-old-tagged-version-into-master-without-losing-history

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!