How to remove commits from git history but otherwise keep the graph exactly the same, including merges?

后端 未结 5 909
礼貌的吻别
礼貌的吻别 2021-01-14 12:25

What I have:

---A----B-----C-----D--------*-----E-------> (master)
                     \\      /
                      1----2 (foo)


        
相关标签:
5条回答
  • 2021-01-14 13:08

    While what I am proposing will give you a clean, linear history; that's what rebase is supposed to do essentially. However, am hoping this gives you a way to remove B and B' from the commit history. Here goes the explanation:

    Repo recreation output:
    ---A----B-----B'-----C--------D-------> (master)
                          \      /
                           1----2 (foo)
    
    git log --graph --all --oneline --decorate #initial view the git commit graph
    * dfa0f63 (HEAD -> master) add E
    *   843612e Merge branch 'foo'
    |\  
    | * 3fd261f (foo) add 2
    | * ed338bb add 1
    |/  
    * bf79650 add C
    * ff94039 modify B
    * 583110a add B
    * cd8f6cd add A
    
    git rebase -i HEAD~5 #here you drop 583110a/add B and ff94039/modify B from
    foo branch.
    
    git log --graph --all --oneline --decorate
    $ git rebase -i HEAD~5
    * 701d9e7 (HEAD -> master) add E
    * 5a4be4f add 2
    * 75b43d5 add 1
    * 151742d add C
    | * 3fd261f (foo) add 2
    | * ed338bb add 1
    | * bf79650 add C
    | * ff94039 modify B
    | * 583110a add B
    |/  
    * cd8f6cd add A
    
    $ git rebase -i master foo #drop 583110a/add B and ff94039/modify B again
    
    $ git log --graph --all --oneline --decorate #view the git commit graph
    
    * 701d9e7 (HEAD -> foo, master) add E
    * 5a4be4f add 2
    * 75b43d5 add 1
    * 151742d add C
    * cd8f6cd add A
    

    Lastly, the final out might not be in the order you'd expected A--C--1---2---E. However, you can re-arrange the order within the interactive mode again. Try git rebase -i HEAD~n.

    Note: It's best to avoid changing commit/publishing history. I am a newbie and exploring git, hopefully the above solution should stick. That said am sure there are tonnes of other easier solutions available online. I found this article quite helpful, for future reference for all.

    0 讨论(0)
  • 2021-01-14 13:18

    The first thing to understand is that commits are immutable objects. When you rewrite history as you propose, you will end up with a completely different set of commits. The parent is part of each commit's immutable hash, among other things that you can't change. If you do what you propose, your history will look like this:

         D'-----E'-----> (master)
        /
    ---A----B-----C-----D--------E-------> (abandoned)
                         \      /
                          1----2 (foo)
    

    To acheive this, you would simply rebase D..E onto A and reset master to E'. You can (but really don't have to) then rebase 1..foo onto D'.

    A much simpler, and in my opinion correct, way would be to just delete the file in a new commit:

    ---A----B-----C-----D--------E-----F-----> (master)
                         \      /
                          1----2 (foo)
    

    Here F is the result of git rm that_file. The purpose of git is to maintain history. Pruning it just because it doesn't look pretty isn't productive (again, my opinion). The only time I would recommend the former option is of the file in question has sensitive information like passwords in it.

    If, on the other hand, scrubbing the file is what you want, you will have to take more extreme measures. For example: How to remove file from Git history?

    0 讨论(0)
  • 2021-01-14 13:21

    So I use rebase -i f0e0796 and remove B 5ccb371 and and C a46df1c, correct? If I interpret the result correctly, this is what gitk shows me for my repo, although git branches still lists the second branch.

    ...A---1---2---E    master
    

    Can anyone tell me what happened here?

    That's what it's built to do: produce a merge-free linear history from a single tip to a single base, preserving all the parts that might still need a mergeback to the new base.

    The rebase docs could be clearer about this: "commits which are clean cherry-picks (as determined by git log --cherry-mark …) are always dropped." is mentioned only as an aside in an option for how to treat empty commits, and "by default, a rebase will simply drop merge commits from the todo list, and put the rebased commits into a single, linear branch." is only mentioned farther along, in the description of another option. But that's what it's for, to automate the tedious identification and elimination of already-applied fixes and noise merges from an otherwise-straightforward cherry-pick.


    Is git rebase the feature I am looking for my problem?

    Not really. The --rebase-merges option is being beefed up, and Inigo's answer works well for your specific case, but see the warnings in its docs: it has real limitations and caveats. As Inigo's answer points out, "[t]hese steps assume the exact repo you show in your question", and "git rebase just automates a series of steps that you can just as well do manually". The reason for this answer is, for one-off work it's generally better to just do it.

    Rebase was built to automate a workflow where you have a branch you're merging from or otherwise keeping in sync with during development, and at least for the final mergeback (and maybe a few times before that) you want to clean up your history.

    It's handy for lots of other uses (notably carrying patches), but again: it's not a cure-all. You need lots of hammers. Many of them can be stretched to serve in a pinch, and I'm a big fan of "whatever works", but I think that's best for people who are already very well acquainted with their tools.

    What you want isn't to produce a single, clean linear history, you want something different.

    The general way to do it with familiar tools is easy, starting from your demo script it'd be

    git checkout :/A; git cherry-pick :/D :/1 :/2; git branch -f foo
    git checkout foo^{/D}; git merge foo; git cherry-pick :/E; git branch -f master
    

    and you're done.

    Yes, you could get git rebase -ir to set this up for you, but when I looked at the pick list that produces, editing in the right instructions did not seem simpler or easier than the above sequence. There's figuring out what exact result you want, and figuring out how to get git rebase -ir to do it for you, and there's just doing it.

    git rebase -r --onto :/A :/C master
    git branch -f foo :/2
    

    is the "whatever works" answer I'd probably use for, as Inigo says "the exact repo you show in your question". See the git help revisions docs for the message-search syntax.

    0 讨论(0)
  • 2021-01-14 13:22

    git rebase by default only rebases to a single lineage of commit history, because that is more commonly what people want. If you don't tell it otherwise, it will do it for the branch you have checked out (in your case that was master). That is why you ended up with a rebased master branch with the foo commits grafted on rather than merged in, and with foo itself unchanged and no longer connected.

    If you have git version 2.18 or greater you can use the --rebase-merges option* to tell git to recreate the merge history rather than linearize it as it does by default. The rebased history will have the same branch-offs and merges-back in. Below I'll walk you through the steps for acheiving what you want using --rebase-merges.

    These steps assume the exact repo you show in your question.

    1. git checkout master
    2. git rebase -i --rebase-merges f0e0796
    3. in the interactive rebase todo file:
      • remove the two commits you wanted to drop (or comment them out, or change pick to drop or d)
      • on a new line immediately after the line label foo, add the following:
      exec git branch -f foo head
      
      (see below for explanation)
    4. save and close the todo file and voilà, git will rebase the commits with the graph looking exactly as you wanted.


    the todo file explained

    git rebase just automates a series of steps that you can just as well do manually. This sequence of steps is represented in the todo file. git rebase --interactive allows you to modify the sequence before it executes.

    I'll annotate it with an explanation including how you would do it manually (good learning experience). It's important to get a feel for this if you do a lot of rebases in the future, so you have good bearings when merge conflicts occur, or when you tell the rebase to pause at points so you can do some manual mods.

    label onto                  // labels "rebase onto" commit (f0e0796)
                                // this is what you would do in your head
                                // if doing this manually
    # Branch foo
    reset onto                  // git reset --hard <onto>
    drop 5ccb371 add B          // skip this commit
    drop a46df1c modify B       // skip this commit
    pick 8eb025b add C          // git cherry-pick 8eb025b
    label branch-point          // label this commit so we can reset back to it later
    pick f5b0116 add 1          // git cherry-pick f5b0116
    pick 175e01f add 2          // git cherry-pick 175e01f
    label foo                   // label this commit so we can merge it later
                                //   This is just a rebase internal label. 
                                //   It does not affect the `foo` branch ref.
    exec git branch -f foo head // point the `foo` branch ref to this commit 
    
    reset branch-point # add C  // git reset --hard <branch-point>
    merge -C b763a46 foo # Merge branch 'foo'  // git merge --no-ff foo
                                               // use comment from b763a46
    

    exec git branch -f foo head explained

    As I mentioned above, git rebase only operates on one branch. What this exec command does is change the ref foo to point to the current head. As you can see in the sequence in the todo file, you are telling it to do this right after it has committed the last commit of the foo branch ("add 2"), which is conveniently labeled label foo in the todo file.

    If you don't need the foo ref anymore (e.g. it's a feature branch and this is its final merge) you can skip adding this line to the todo file.

    You can also skip adding this line and separately repoint foo to the commit you want it to after the rebase is done:

    git branch -f foo <hash of the rebased commit that should be the new head of `foo`>
    

    Let me know if you have any questions.


    *If you have an older version of git, you can use the now deprecated --preserve-merges option, though it isn't compatible with rebase's interactive mode.

    0 讨论(0)
  • 2021-01-14 13:22

    To rearrange the commit history, there are several ways.

    The problem with rebase, when you want to change an entire repo's history, is that it only moves one branch at a time. Additionally it has problems dealing with merges, so you cannot simply rebase D and E onto A while preserving the more recent history as it exists now (because E is a merge).

    You can work around all that, but the method is complicated and error-prone. There are tools that are designed for full-repo rewrites. You might want to look at filter-repo (a tool that replaces filter-branch) - but it sounds like you're just trying to scrub a partiular file from your history, which (1) might be a good job for the BFG Repo Cleaner, or (2) is actually an easy enough task with filter-branch

    (If you want to look into BFG, https://rtyley.github.io/bfg-repo-cleaner/ ; if you want to look into filter-repo, https://github.com/newren/git-filter-repo)

    To use filter-branch for this purpose

    git filter-branch --index-filter 'git rm --cached --ignore-unmatch path/to/file' --prune-empty -- --all
    

    However - you indicated that you need the file not to be in the repo (as a counter to someone's suggestion to just delete it from the next commit). So you need to understand that git doens't give up information quite that easily. After using any of these technique, you could still extract the file from the repo.

    This is a kind of a big topic and has been discussed a nubmer of times in various questions/answers on SO, so I suggest searching for what you really need to be asking: how to permanently remove a file that should never have been under source control.

    A few notes:

    1 - If there are passwords and they were ever pushed to a shared remote, those passwords are compromised. There is nothing you can do about it; change the passwords.

    2 - Each repo (the remote and each and every clone) has to be deliberately scrubbed, or thrown away and replaced. (The fact that you can't force someone to do that if they don't want to cooperate is one of the reaosns for (1).)

    3 - In the local repo where you made the repairs, you have to get rid of the reflogs (as well as backup refs that may have been created if you used a tool like filter-branch) and then run gc. Or, it may be easier to re-clone to a new repo that only fetches the new verisons of the branches.

    4 - Cleaning up the remote may not even be possible, depending on how it's hosted. Sometimes the best you can do is nuke the remote and then recreate it from scratch.

    0 讨论(0)
提交回复
热议问题