I\'m in the process of splitting up an old suite of applications which originally resided in a single Subversion repository.
I\'ve converted it over to a Git repository
I did this a couple of times - extract commits for a single file and create new repository from them. It goes somewhat like this:
$ c=10; for commit in $(git log --format=%h -- path/to/file|tac); do
c=$((c+1))
git format-patch -1 --stdout $commit > $c.patch
done
This creates the patch files 11.patch, 12.patch and so on. I then edit these patches (using vim or perl whichever seems best for the job), removing entire hunks for files that I'm not interested in, and maybe fix the names as well in case of renames in the diff hunk header.
The I'd use git am on the patches on a new git repository. If something doesn't come up right then I nuke the new git repository and edit the patches again and repeat the git am.
The reason I start counting from 10 is because I'm lazy to prepend a leading 0 to the patch sequence and for commits more than 99 I just start at 99.
Helping to the second answer: "Maybe others can chime in on how to find out the previous name of a tracked file in case of renames."
This will return the files in your project and the files from which they are renamed.
for file in `git ls-files`; do git log --follow --name-only --pretty=format: $file | sort -n -b | uniq | sed '/^\s*$/d'; done
You can use them to exclude from the list.
The whole solution is:
for file in `git ls-files`; do git log --follow --name-only --pretty=format: $file | sort -n -b | uniq | sed '/^\s*$/d'; done > current.txt
git log --raw |awk '/^:/ { if (! printed[$6]) { print $6; printed[$6] = 1 }}'|while read f;do if [ ! -f $f ]; then echo $f;fi;done | sort > hist.txt
diff --new-line-format="" --unchanged-line-format="" hist.txt current.txt > for_remove.txt
Here's how you can use git filter-branch to get rid of all files that you don't want:
Get a list of the filenames that you don't want to appear in the history both the old names and the new names in case of renames. For example put them in a file called toberemoved.txt
Run git filter-branch like this:
$ git filter-branch --tree-filter "rm -f `cat toberemoved.txt`" branch1 branch2 ...
Here's the relevant man page from git filter-branch:
--tree-filter <command>
This is the filter for rewriting the tree and its contents. The
argument is evaluated in shell with the working directory set to
the root of the checked out tree. The new tree is then used as-is
(new files are auto-added, disappeared files are auto-removed -
neither .gitignore files nor any other ignore rules HAVE ANY
EFFECT!).
So just make sure that the list of files you want deleted are all relative to the root of the checked out tree.
Update:
To get the list of the files that were present in the past but not in the current working directory you can run the following. Note that you'll have to do further effort to keep the "history before renaming" of renamed files:
$ git log --raw |awk '/^:/ { if (! printed[$6]) { print $6; printed[$6] = 1 }}'|while read f;do if [ ! -f $f ]; then echo Deleted: $f;fi;done
That $6 is the name of the file that were affected in a commit in shown in the --raw mode of log.
See the --diff-filter option to git log if you want know what happened ([D]eleted, [R]enamed, [M]odified, and so on) to each file for every commit.
Maybe others can chime in on how to find out the previous name of a tracked file in case of renames.