I am currently importing a cvs project into git.
After importing, i want to rewrite the history to move an existing directory into a seperate submodule.
Suppose
I have a project with a utils
library that's started to be useful in other projects, and wanted to split its history off into a submodules. Didn't think to look on SO first so I wrote my own, it builds the history locally so it's a good bit faster, after which if you want you can set up the helper command's .gitmodules
file and such, and push the submodule histories themselves anywhere you want.
The stripped command itself is here, the doc's in the comments, in the unstripped one that follows. Run it as its own command, with subdir
set, like subdir=utils git split-submodule
if you're splitting the utils
directory. It's hacky because it's a one-off, but I tested it on the Documentation subdirectory in the Git history.
#!/bin/bash
# put this or the commented version below in e.g. ~/bin/git-split-submodule
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
| git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
subfam=($( set -- ${fam[@]}; shift;
for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
git rev-parse -q --verify $tpar:"$subdir"
done
))
git rm -rq --cached --ignore-unmatch "$subdir"
if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
git update-index --add --cacheinfo 160000,$subfam,"$subdir"
else
subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
| git commit-tree $GIT_COMMIT:"$subdir" $(
${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
` &&
git update-index --add --cacheinfo 160000,$subnew,"$subdir"
fi
}
${debug+set +x}
#!/bin/bash
# Git filter-branch to split a subdirectory into a submodule history.
# In each commit, the subdirectory tree is replaced in the index with an
# appropriate submodule commit.
# * If the subdirectory tree has changed from any parent, or there are
# no parents, a new submodule commit is made for the subdirectory (with
# the current commit's message, which should presumably say something
# about the change). The new submodule commit's parents are the
# submodule commits in any rewrites of the current commit's parents.
# * Otherwise, the submodule commit is copied from a parent.
# Since the new history includes references to the new submodule
# history, the new submodule history isn't dangling, it's incorporated.
# Branches for any part of it can be made casually and pushed into any
# other repo as desired, so hooking up the `git submodule` helper
# command's conveniences is easy, e.g.
# subdir=utils git split-submodule master
# git branch utils $(git rev-parse master:utils)
# git clone -sb utils . ../utilsrepo
# and you can then submodule add from there in other repos, but really,
# for small utility libraries and such, just fetching the submodule
# histories into your own repo is easiest. Setup on cloning a
# project using "incorporated" submodules like this is:
# setup: utils/.git
#
# utils/.git:
# @if _=`git rev-parse -q --verify utils`; then \
# git config submodule.utils.active true \
# && git config submodule.utils.url "`pwd -P`" \
# && git clone -s . utils -nb utils \
# && git submodule absorbgitdirs utils \
# && git -C utils checkout $$(git rev-parse :utils); \
# fi
# with `git config -f .gitmodules submodule.utils.path utils` and
# `git config -f .gitmodules submodule.utils.url ./`; cloners don't
# have to do anything but `make setup`, and `setup` should be a prereq
# on most things anyway.
# You can test that a commit and its rewrite put the same tree in the
# same place with this function:
# testit ()
# {
# tree=($(git rev-parse `git rev-parse $1`: refs/original/refs/heads/$1));
# echo $tree `test $tree != ${tree[1]} && echo ${tree[1]}`
# }
# so e.g. `testit make~95^2:t` will print the `t` tree there and if
# the `t` tree at ~95^2 from the original differs it'll print that too.
# To run it, say `subdir=path/to/it git split-submodule` with whatever
# filter-branch args you want.
# $GIT_COMMIT is set if we're already in filter-branch, if not, get there:
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
| git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
subfam=($( set -- ${fam[@]}; shift;
for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
git rev-parse -q --verify $tpar:"$subdir"
done
))
git rm -rq --cached --ignore-unmatch "$subdir"
if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
# one id same for all entries, copy mapped mom's submod commit
git update-index --add --cacheinfo 160000,$subfam,"$subdir"
else
# no mapped parents or something changed somewhere, make new
# submod commit for current subdir content. The new submod
# commit has all mapped parents' submodule commits as parents:
subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
| git commit-tree $GIT_COMMIT:"$subdir" $(
${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
` &&
git update-index --add --cacheinfo 160000,$subnew,"$subdir"
fi
}
${debug+set +x}
Note: the submodule entry is only created when you do, from the parent repo a
git submodule init
git submodule update
You don't need those commands in your rewrite-submodule-tree-filter
script, since it is only about setting correctly the .gitmodules
file content.
You would execute those "git submodule
" commands only when you are using the parent repo for the first time: see "Cloning a Project with Submodules".
I resolved my own question, here is the solution:
git-submodule-split library another_library
Script git-submodule-split
:
#!/bin/bash set -eu if [ $# -eq 0 ] then echo "Usage: $0 submodules-to-split" fi export _tmp=$(mktemp -d) export _libs="$@" for i in $_libs do mkdir -p $_tmp/$i done git filter-branch --commit-filter ' function gitCommit() { git add -A if [ -n "$(git diff --cached --name-only)" ] then git commit -F $_msg fi } >/dev/null # from git-filter-branch git checkout-index -f -u -a || die "Could not checkout the index" # files that $commit removed are now still in the working tree; # remove them, else they would be added again git clean -d -q -f -x _git_dir=$GIT_DIR _git_work_tree=$GIT_WORK_TREE _git_index_file=$GIT_INDEX_FILE unset GIT_DIR unset GIT_WORK_TREE unset GIT_INDEX_FILE _msg=$(tempfile) cat /dev/stdin > $_msg for i in $_libs do if [ -d "$i" ] then unset GIT_DIR unset GIT_WORK_TREE unset GIT_INDEX_FILE cd $i if [ -d ".git" ] then gitCommit else git init >/dev/null gitCommit fi cd .. rsync -a -rtu $i/.git/ $_tmp/$i/.git/ export GIT_DIR=$_git_dir export GIT_WORK_TREE=$_git_work_tree export GIT_INDEX_FILE=$_git_index_file git rm -q -r --cached $i git submodule add ./$i >/dev/null git add $i fi done rm $_msg export GIT_DIR=$_git_dir export GIT_WORK_TREE=$_git_work_tree export GIT_INDEX_FILE=$_git_index_file if [ -f ".gitmodules" ] then git add .gitmodules fi _new_rev=$(git write-tree) shift git commit-tree "$_new_rev" "$@"; ' --tag-name-filter cat -- --all for i in $_libs do if [ -d "$_tmp/$i/.git" ] then rsync -a -i -rtu $_tmp/$i/.git/ $i/.git/ cd $i git reset --hard cd .. fi done rm -r $_tmp git for-each-ref refs/original --format="%(refname)" | while read i; do git update-ref -d $i; done git reflog expire --expire=now --all git gc --aggressive --prune=now
Here is an updated answer that works for me on MacOSX. The major change is the use of pushd/popd to change directories, so that a submodule can be something like module/glop and not just glop.
#!/bin/bash
set -eu
if [ $# -eq 0 ]
then
echo "Usage: $0 submodules-to-split"
fi
export _tmp=$(mktemp -d /tmp/git-submodule-split.XXXXXX)
export _libs="$@"
for i in $_libs
do
mkdir -p $_tmp/$i
done
git filter-branch --commit-filter '
function gitCommit()
{
git add -A
if [ -n "$(git diff --cached --name-only)" ]
then
git commit -F $_msg
fi
} >/dev/null
# from git-filter-branch
git checkout-index -f -u -a || die "Could not checkout the index"
# files that $commit removed are now still in the working tree;
# remove them, else they would be added again
git clean -d -q -f -x >&2
_git_dir=$GIT_DIR
_git_work_tree=$GIT_WORK_TREE
_git_index_file=$GIT_INDEX_FILE
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE
_msg=$(mktemp /tmp/git-submodule-split-msg.XXXXXX)
cat /dev/stdin > $_msg
for i in $_libs
do
if [ -d "$i" ]
then
unset GIT_DIR
unset GIT_WORK_TREE
unset GIT_INDEX_FILE
pushd $i > /dev/null
if [ -d ".git" ]
then
gitCommit
else
git init >/dev/null
gitCommit
fi
popd > /dev/null
mkdir -p $_tmp/$i
rsync -a -rtu $i/.git/ $_tmp/$i/.git/
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file
git rm -q -r --cached $i >&2
git submodule add ./$i $i >&2
git add $i >&2
fi
done
export GIT_DIR=$_git_dir
export GIT_WORK_TREE=$_git_work_tree
export GIT_INDEX_FILE=$_git_index_file
if [ -f ".gitmodules" ]
then
git add .gitmodules >&2
fi
_new_rev=$(git write-tree)
shift
git commit-tree -F $_msg "$_new_rev" $@;
rm -f $_msg
' --tag-name-filter cat -- --all
for i in $_libs
do
if [ -d "$_tmp/$i/.git" ]
then
rsync -a -i -rtu $_tmp/$i/.git/ $i/.git/
pushd $i
git reset --hard
popd
fi
done
rm -rf $_tmp
git for-each-ref refs/original --format="%(refname)" | while read i; do git update-ref -d $i; done
git reflog expire --expire=now --all
git gc --aggressive --prune=now