问题

I would like to squash an entire Git repository down to a single commit, and actually remove all other commits.

I have found several suggestions, including:

$ git reset --soft <root-commit>

This works with respect to the squashing, but it's still possible to checkout the previous commits if you know their id. How can I get rid off them as well?

Maybe the simplest solution would be to delete the .git directory, and run git init again, wouldn't it? If I re-add the origin, and then use git push --force, I could even keep the same GitHub repository, right?

PS: In this question I have clarified what I actually want to achieve.

回答1:

UPDATE - cantSleepNow's comment got me thinking about a couple caveats to my answer.

You want to be aware of the state of untracked files, especially if you do rebuild the repository. What exactly that means depends on how you use your work tree, and on how your ignore rules are set up.
You also may have repository-specific configuration to consider.

Untracked Files:

I generally keep my worktree in a "clean" state, meaning that git status should not report anything untracked most of the time. Further, I try to use .gitignore for my ignore rules, which should ideally be few in number (directory-based rules for output directories, pattern-based rules for IDE-generated files that might be sprinkled throughout the work tree...)

If you follow those same practices, then you usually shouldn't have to do anything special about untracked files; your ignore patterns will still be there when you init the new repo. However, if you previously had committed files that would match your ignore rules (and if this is deliberate such that you still want them), then you'd have to force-add them to your new repo (or else remove the ignore rules, add them, and then re-add the ignore rules).

If you have local ignore rules in .git/info/exlcude, then of course those would go away when you delete .git (unless you back them up).

If you keep untracked files that aren't in your ignore rules, you'll have to make sure you don't accidentally add them to the new repo. (I would encourage you to use ignore rules for those going forward.) One solution, if you know you don't need the contents of any untracked files, is to use git clean to be rid of them.

Repo Configuration

Your .git directory can contain things like repo-specific config settings, hook scripts, local exclude rules (touched on above), LFS configuration (and object content), ...

If your usage of git is simple, you might not have any of these things. If you do anything that's repo-specific (and not checked in / source controlled), then it likely is stored under .git and you need to review whether to back it up. If you're not sure, then you may need to use a different method to safely clean the repo (so I'll provide one below).

So getting back to your options...

Originally I suggested that the simplest thing to do, if you want to be sure history is gone, is

rm -rf .git
git init
git add .
git commit

Any other procedure is mostly just a longer / more error-prone way to imitate this result. But you may have extra steps if you identified things you want to keep from .git, like hooks or local config. And if you aren't sure whether anything in .git might still be needed, then you need a way to just delete what you don't want.

To cleanse a repo of content:

First, make sure you have the work tree you want for your new single commit checked out into your work tree.

Now, if you aren't on master, go ahead and

git branch -f master
git checkout master

Then delete all of the refs. You can use git commands to do this (and in some circumstances that's safer), but the simplest way if you know you want to wipe them all out is

rm .git/packed-refs
rm -rf .git/refs/*

This will kind of confuse git, but it will leave you in a state where your index and work tree are unchanged (still your old master state), but there's no recognized parent commit, so everything is a newly added file.

git commit

You should git a new commit with no history, and master should point to it.

Now you need to get rid of the reflog, because it can still reach the old commits. Again you could use git commands, but I've had the best luck with

rm -rf .git/logs

And now you can get rid of the old commits with

git gc --aggressive --prune=now

and verify that old commits are no longer to be found.

That's fine for your local repo; but github...

You've expressed a desire to keep your existing repo, but you've also noted that you don't want someone to be able to get the old commits even if they know the SHA1.

A force push will overwrite the ref for the upstream of the current branch (probably master since you haven't specified otherwise). It will not affect other refs (branches, tags) if there are any, and it will not affect other commits.

To remove commits, you need (1) to be sure nothing (short of a direct SHA1 reference) can reach them, and (2) to run git gc. A tweet from github support says:

We run git gc at most once per day, triggered automatically by a push.

So it seems you don't have much control over that. The force push might trigger a gc, and that gc might clear away the old commits, but you'd have to test whether it really did (clear your browser cache, try to access one of the commits that should be gone).

As with the local repo, if this is important then it's probably easier and safer to delete the repo and create a new one.

回答2:

Yes, if you you delete .git, you can start over from scratch.

but it's still possible to checkout the previous commits if you know their id

Sure...

Maybe the simplest solution would be to delete the .git directory, and run git init again, wouldn't it? If I re-add the origin, and then use git push --force, I could even keep the same GitHub repository, right?

Yes, but then all those commits are still on the remote (github) repository, as you noticed.

From the comments, you wish to delete a file (with a license) which was in there from the beginning.

A: Delete everything

If you do not care about the history at all, then proceed to delete everything, including the GitHub repository. In fact, I myself would simply create a new GitHub repository and a new local one, and start from scratch; just committing everything as if it were the very first commit (which it is).

B: Manual rebase

If you would like to keep some history, you can absolutely do that as well. Here's some pseudocode:

Create a new, empty, local git repositoriy (git init /new).
For each $COMMIT in the old repos (let's call it /old), linearly from ROOT to master:
- cd /old ; git checkout $COMMIT
- rm /new/* ; cp /old/* /new/ ; rm /new/license.txt`
  - This syntax skips all directory entries starting with ., i.e., .git. Refine this if you actually do have files starting with .(like .gitignore) which you want to keep.
- cd /new ; git add -A ; git commit -m "$MESSAGE"
  - Extracting the $MESSAGE from the old repos left as excercise ;)

This is basically a manual git rebase -i which makes 100% sure that you have 100% control of what ends up in the repository. It is quite straightforward and there can be no conflicts, prompts or whatever.

C: Rebase with --exec

The third way would be like this:

cd /old
git checkout master
git rebase --exec "rm license.txt" --root 
git clone --single-branch master /old /new

This way you also end up with the same content in /new, but it will be awkward if you have merge commits, depending on how/what changed license.txt you could get spurious merge conflict etc.; I would probably try it once, and if it starts to be laborious, quickly switch to the B method.

回答3:

The previous revisions (locally) will be removed by garbage collection. git has a number of safeguards implemented to try not to delete things at once but it can be hacked with options to remove everything that is not being pointed to by some reference (tags, branches, other things like reflog references, stash, etc). If you are considering "remote" branches, then you can force push into them so that they also lose the previous revisions.

回答4:

You can use the squashing option in git rebase, particularly in its --interactive (or -i) mode (see squashing commits with rebase for a good presentation).

Note that git rebase is in itself a squashing mechanism, but starting form the problem to "Reapply commits on top of another base".

In interactive mode you're presented with a specific commit editor that gives you the faculty to manage the single commits, picking them or squashing them. And the faculty of combining manually the single commit messages.

The typical scenario is when you want merge many little commits into one, simplyfying history log.

In the end, with git rebase you can squash the commit's base physically and logically.

There's also an --autosquash option.

Rebasing should solve the squashing part of the problem and get rid of the starting commit's base, combining the whole into a new single commit.
The correct solution depends, obviously, on a correct management of the branches. But the workflow is as simple as branching at the desired root (common ancestor) and rebasing on top of it.
You can then delete the rest.

来源：https://stackoverflow.com/questions/43853944/how-to-squash-a-git-repository-to-a-single-commit-and-destroy-everything-else

标签

git

git-rewrite-history

How to squash a Git repository to a single commit and destroy everything else?

问题