I have a clone. I want to reduce the history on it, without cloning from scratch with a reduced depth. Worked example:
$ git clone git@github.com:apache/spa
Edit, Feb 2017: this answer is now outdated / wrong. Git can make a shallow clone shallower, at least internally. Git 2.11 also has --deepen
to increase the depth of a clone, and it looks as though there are eventual plans to allow negative values (though right now they are rejected). It's not clear how well this works in the real world, and your best bet is still to clone the clone, as in jthill's answer.
You can only deepen a repository. This is primarily because Git is built around adding new stuff. The way shallow clones work is that your (receiving) Git gets the sender (another Git) to stop sending "new stuff" upon reaching the shallow-clone-depth argument, and coordinates with the sender so as to understand why they have stopped at that point even though more history is obviously required. They then write the IDs of "truncated" commits into a special file, .git/shallow
, that both marks the repository as shallow, and notes which commits are truncated.
Note that during this process, your Git is still adding new stuff. (Also, when it has finished cloning and exits, Git forgets what the depth was, and over time it becomes impossible even to figure out what it was. All Git can tell is that this is a shallow clone, because the .git/shallow
file containing commit IDs still exists.)
The rest of Git continues to be built around this "add new stuff" concept, so you can deepen the clone, but not increase its shallowness. (There's no good, agreed-upon verb for this: the opposite of deepening a pit is filling it in, but fill has the wrong connotation. Diminish might work; I think I'll use that.)
In theory, git gc
, which is the only part of Git that ever actually throws anything out,1 could perhaps diminish a repository, even converting a full clone into a shallow one, but no one has written code to do that. There are some tricky bits, e.g., do you discard tags? Shallow clones start out sans tags for implementation reasons, so converting a repository to shallow, or diminishing an existing shallow repository, might call for discarding at least some tags. Certainly any tag pointing to a commit wiped out by the diminish action would have to go.
Meanwhile, the --depth
argument to git-pack-objects
(passed through from git repack
) means something else entirely: it's the maximum length of a delta chain, when Git uses its modified xdelta compression on Git objects stored in each pack-file. This has nothing to do with the depth of particular parts of the commit DAG (as computed from each branch head).
1Well, git repack
winds up throwing things out as a side effect, depending on which flags are used, but it's invoked this way by git gc
. This is also true of git prune
. For these two commands to really do their job properly, they need git reflog expire
run first. The "normal user" end of the clean-things-up sequence is git gc
; it deals with all of this. So we can say that git gc
is how you discard accumulated "new stuff" that turned out to be unwanted after all.
since at least git version 2.14.1 (september 2017) there is
git fetch --depth 10
this will fetch the newest commits from origin (if there are any) and then cut off the local history to depth of 10 (if it was longer).
for normal purposes your git history is now at length of 10. but beware that the actual commits still linger in your local repository.
if your aim was to have a shorter log because you currently don't need years worth of commit history then you are done. your log will be short and most common git commands now only see 10 commits.
if your aim was to free disk space because older commits have huge binary blobs which you don't need to work now then you have to actually remove the old commits from your local repository. to do so you need to remove all references that are holding them. that is (as far as i know) the reflog and the tags. also branches and stashes.
note that all commits still exist in the remote repository (origin). so if your aim was to remove a password from old commits then you need to remove the commits from the remote repository. also from all clones of the remote repository. see links below for more info.
how to remove old commits (data loss warning! see notes below):
to clear the reflog:
git reflog expire --expire=all --all
to remove all tags:
git tag -l | xargs git tag -d
then actually remove the commits from disk:
git gc --prune=all
now the old commits should be completely removed from disk.
note about the remove all tags command: the command will remove all tags from your local repository. if all your tags are also on the remote then this is fine. the next git fetch
will refetch the relevant tags. but if you have tags which are only in your local repository then you need to backup them somehow.
the reflog is cleared automatically after certain time (90 days?) by automatic git gc
. tags however will stay around forever. so if you want to free disk space from old commits you have to at least remove the tags manually.
the reflog is something like a local history of past local repository states. many git commands will record the previous state of the local repository in the reflog. with the reflog you can undo some commands or at least retrieve lost data if you made a mistake. so think before you clear the reflog.
the reflog is entirely local to your local repository.
see also
https://linuxhint.com/git-shallow-clone-and-clone-depth/
http://gitready.com/intermediate/2009/02/09/reflog-your-safety-net.html
How do I edit past git commits to remove my password from the commit logs?
git clone --mirror --depth=5 file://$PWD ../temp
rm -rf .git/objects
mv ../temp/{shallow,objects} .git
rm -rf ../temp
This really isn't cloning "from scratch", as it's purely local work and it creates virtually nothing more than the shallowed-out pack files, probably in the tens of kbytes total. I'd venture you're not going to get more efficient than this, you'll wind up with custom work that uses more space in the form of scripts and test work than this does in the form of a few kb of temporary repo overhead.
OK here's an attempt to bash it, that ignores non-default branches, and also assumed the remote is called 'origin':
#!/bin/sh
set -e
mkdir .git_slimmer
cd $1
changed_lines=$(git status --porcelain | wc -l)
ahead_of_remote=$(git status | grep "Your branch is ahead" | wc -l)
remote_url=$(git remote show origin | grep Fetch | cut -d' ' -f5)
latest_sha=$(git log | head -n 1 | cut -d' ' -f2)
cd ..
if [ "$changed_lines" -gt "0" ]
then
echo "Untracked Changes - won't make the clone slimmer in that situation"
exit 1
fi
if [ "$ahead_of_remote" -gt "0" ]
then
echo "Local commits not in the remote - won't make the clone slimmer in that situation"
exit 1
fi
cd .git_slimmer
git clone $remote_url --no-checkout --depth 1 foo
cd foo
latest_sha_for_new=$(git log | head -n 1 | cut -d' ' -f2)
cd ../..
if [ "$latest_sha" == "$latest_sha_for_new" ]
then
mv "$1/.git" "$1/.gitOLD"
mv ".git_slimmer/foo/.git" "$1/"
rm -rf "$1/.gitOLD"
cd "$1"
git add .
cd ..
else
echo "SHA from head of existing get clone does not match the latest one from the remote: do a git pull first"
exit 1
fi
rm -rf .git_slimmer
Use: 'git-slimmer.sh <folder_containing_git_repo>'