Can I rewrite an entire git repository's history to include something we forgot?

你说的曾经没有我的故事 提交于 2020-01-01 05:23:23

问题


We recently completed a conversion from Mercurial to Git, everything went smoothly, we were even able to get the transforms needed to make everything look / work relatively correctly in the repository. We added a .gitignore and got underway.

However we're experiencing some extreme slowdowns as soon as we encorporate/work with any of our old feature branches. A little exploring and we found that since the .gitignore was only added to the develop branch when we look at other commits without merging develop up into them git chuggs because it's choking trying to analyze all our build artifacts (binary files) etc... since there was no .gitignore file for these old branches.

What we'd like to do is effectively insert a new root commit with the .gitignore so it would retroactively populate in all heads/tags. We're comfortable with a re-write of history, our team is relatively small so getting everyone to halt for this operation and re-pull thier repositories when the history re-write is done is no problem.

I've found information about rebasing master onto a new root commit and this works for master, the problem is it leaves our feature branches detached on the old history tree, it also replays the entire history with a new commit date/time.

Any ideas or are we out of luck on this one?


回答1:


What you want to do will involve two phases: retroactively add a new root with a suitable .gitignore and scrub your history to remove files that should not have been added. The git filter-branch command can do both.

Setup

Consider a representative of your history.

$ git lola --name-status
* f1af2bf (HEAD, bar-feature) Add bar
| A     .gitignore
| A     bar.c
| D     main.o
| D     module.o
| * 71f711a (master) Add foo
|/
|   A   foo.c
|   A   foo.o
* 7f1a361 Commit 2
| A     module.c
| A     module.o
* eb21590 Commit 1
  A     main.c
  A     main.o

For clarity, the *.c files represent C source files and *.o are compiled object files that should have been ignored.

On the bar-feature branch, you added a suitable .gitignore and deleted object files that should not have been tracked, but you want that policy reflected everywhere in your import.

Note that git lola is a non-standard but useful alias.

git config --global alias.lola \
  'log --graph --decorate --pretty=oneline --abbrev-commit --all'

New Root Commit

Create the new root commit as follows.

$ git checkout --orphan new-root
Switched to a new branch 'new-root'

The git checkout documentation notes a possible unanticipated state of the new orphan branch.

If you want to start a disconnected history that records a set of paths that is totally different from the one of start_point, then you should clear the index and the working tree right after creating the orphan branch by running git rm -rf . from the top level of the working tree. Afterwards you will be ready to prepare your new files, repopulating the working tree, by copying them from elsewhere, extracting a tarball, etc.

Continuing our example:

$ git rm -rf .
rm 'foo.c'
rm 'foo.o'
rm 'main.c'
rm 'main.o'
rm 'module.c'
rm 'module.o'

$ echo '*.o' >.gitignore

$ git add .gitignore

$ git commit -m 'Create .gitignore'
[new-root (root-commit) 00c7780] Create .gitignore
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore

Now the history looks like

$ git lola
* 00c7780 (HEAD, new-root) Create .gitignore
* f1af2bf(bar-feature) Add bar
| * 71f711a (master) Add foo
|/
* 7f1a361 Commit 2
* eb21590 Commit 1

That is slightly misleading because it makes new-root look like it is a descendant of bar-feature, but it really has no parent.

$ git rev-parse HEAD^
HEAD^
fatal: ambiguous argument 'HEAD^': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Make note of the SHA for the orphan because you will need it later. In this example, it is

$ git rev-parse HEAD
00c778087723ae890e803043493214fb09706ec7

Rewriting History

We want git filter-branch to make three broad changes.

  1. Splice in the new root commit.
  2. Delete all the temporary files.
  3. Use the .gitignore from the new root unless one already exists.

On the command line, that is incanted as

git filter-branch \
  --parent-filter '
    test $GIT_COMMIT = eb215900cd15ca2cf9ded74f1a0d9d25f65eb2bf && \
              echo "-p 00c778087723ae890e803043493214fb09706ec7" \
      || cat' \
  --index-filter '
    git rm --cached --ignore-unmatch "*.o"; \
    git ls-files --cached --error-unmatch .gitignore >/dev/null 2>&1 ||
      git update-index --add --cacheinfo \
        100644,$(git rev-parse new-root:.gitignore),.gitignore' \
  --tag-name-filter cat \
  -- --all

Explanation:

  • The --parent-filter option hooks in your new root commit.
    • eb215... is the full SHA of the old root commit, cf. git rev-parse eb215
  • The --index-filter option has two parts:
    • Running git rm as above deletes anything matching *.o from the entire tree because the glob pattern is quoted and interpreted by git rather than the shell.
    • Check for an existing .gitignore with git ls-files, and if it is not there, point to the one in new-root.
  • If you have any tags, they will be mapped over with the identity operation, cat.
  • The lone -- terminates options, and --all is shorthand for all refs.

The output you see will resemble

Rewrite eb215900cd15ca2cf9ded74f1a0d9d25f65eb2bf (1/5)rm 'main.o'
Rewrite 7f1a361ee918f7062f686e26b57788dd65bb5fe1 (2/5)rm 'main.o'
rm 'module.o'
Rewrite 71f711a15fa1fc60542cc71c9ff4c66b4303e603 (3/5)rm 'foo.o'
rm 'main.o'
rm 'module.o'
Rewrite f1af2bf89ed2236fdaf2a1a75a34c911efbd5982 (5/5)
Ref 'refs/heads/bar-feature' was rewritten
Ref 'refs/heads/master' was rewritten
WARNING: Ref 'refs/heads/new-root' is unchanged

Your originals are still safe. The master branch now lives under refs/original/refs/heads/master, for example. Review the changes in your newly rewritten branches. When you are ready to delete the backup, run

git update-ref -d refs/original/refs/heads/master

You could cook up a command to cover all backup refs in one command, but I recommend careful review for each one.

Conclusion

Finally, the new history is

$ git lola --name-status
* ab8cb1c (bar-feature) Add bar
| M     .gitignore
| A     bar.c
| * 43e5658 (master) Add foo
|/
|   A   foo.c
* 6469dab Commit 2
| A     module.c
* 47f9f73 Commit 1
| A     main.c
* 00c7780 (HEAD, new-root) Create .gitignore
  A     .gitignore

Observe that all the object files are gone. The modification to .gitignore in bar-feature is because I used different contents to make sure it would be preserved. For completeness:

$ git diff new-root:.gitignore bar-feature:.gitignore
diff --git a/new-root:.gitignore b/bar-feature:.gitignore
index 5761abc..c395c62 100644
--- a/new-root:.gitignore
+++ b/bar-feature:.gitignore
@@ -1 +1,2 @@
 *.o
+*.obj

The new-root ref is no longer useful, so dispose of it with

$ git checkout master
$ git branch -d new-root



回答2:


Disclaimer: This is theoretical (based on documentation), I have not done this. Clone and try.

From what I understand you have never commitedfiles that wouldnow be filtered by the .gitignoreyou want to add at the root of your history.

Therefore if you rebase your master branch onto a newroot commit containing only the .gitignore, you won't actually modify the content of the commits, and you should afterwards be able to rebase any and all of the other branches that you have onto the new commit, and rebase shall do the work for you.

Because the content of the commits is the same, the patch ID should remain the same, and rebase will only apply that which is necessary.

You will need to rebase each branch one by one though, but that can easily be scripted.

More info can be found in the git rebase documentation in section : RECOVERING FROM UPSTREAM REBASE at the end of the page.

EDIT: Ok nevermind, tested and doesn't work exactly this way. You have to give the point of rebase for each branch in the new history "manually", which is a pain. Could still be made to work but it is clearly a worse solution than accepted answer.



来源:https://stackoverflow.com/questions/27927933/can-i-rewrite-an-entire-git-repositorys-history-to-include-something-we-forgot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!