I\'ve been using the git-subtree extension (https://github.com/apenwarr/git-subtree) to manage sub-projects within our main project. It\'s doing exactly what I want other t
@Chris Johnsen's answer is very right, it explains why spliting works in the clone not a pull situation.
For the work around provided in the question and explained in footnote 2 of @Chris Johnsen's answer, I can confirm that git subtree split -P Some/Sub/Dir -b splitBranch --ignore-joins
and git subtree split -P Some/Sub/Dir -b splitBranch 43b3eb7..
are acturally produced the same commit and same branch which can reflect the modifications done in the local repo, but can not be pushed to the original repoLib repo, because they don't have a common accesstor, even though git diff shows d76a03f0ec7e2
and 43b3eb7d69d
are the same.
So, in order to get subtree push working in a pull situation, the original repoLib remote repo must be added and fetched to get d76a03f0ec7e2
exsited to produce a branch that have a common accesstor with the original repoLib.
The original reproduce script could not run smoothly under linux, here is a new one: http://pastebin.com/3NAQKEz9
The purpose of git subtree split
is to create some new commits (representing “local” changes originally made in the subtree’s local directory) on top of the subtree’s original history. Since it directly involves the subtree’s original history (as the parent commit of the first rewritten local commit that touches the subtree), the split operation can not be done without the subtree’s original history itself being present.
Think about what you will be doing with the history that git subtree split
generates. You will probably want to push it to a repository where you can merge it into the rest of the “upstream” history. In order for this merge operation to make sense, the split history needs to be based on the original history itself1.
Probably the most reliable way to arrange for users to have the subtree’s original history is to publish the URL for the subtree’s upstream repository in your documentation and have them define a remote for it (it is perfectly fine to have “unrelated” remotes in a single repository). E.g.
If you need to work with the “upstream” of
Some/Sub/Dir
(to pull in external changes or push out local changes), please define and update a remote for the library’s repository before usinggit subtree
:git remote add lib git@host:the-lib-repository && git fetch lib
You would need to do something like this even if you were not using --squash
since users would need to know where to get new upstream commits (and where (ultimately) to push new split-generated commits).
Using --squash
gives you a “clean” history in your main project and means that only those users that need to deal with the subtree’s “upstream” actually have to have its objects in their repositories.
It seems like you have a good understanding of the object model. You are correct that the history that git subtree add --squash
pulls in will become dangling2 but that git subtree split
can still use it until it is pruned away.
(with reference to your reproduction script)
You are able to successfully split in your repoMainClone
only because local clones automatically hardlink (or copy) all the files in .git/objects/
(thus getting access to repoMain
’s copies of the dangling (or nearly dangling2) objects from repoLib
) instead of using the usual “pack protocol” transport (which would limit the transferred objects to only those needed for the transferred refs; i.e. omitting anything from repoLib
). Your repoMainPull
is effectively equivalent cloning file://"$(pwd)"/repoMain repoMainCloneFile
(the file://
URL forces local clones to use pack-based transfers instead of just linking/copying everything).
1 Actually, you can directly merge unrelated histories, but you lose the ability to do three-way merges (since there is no common ancestor). This would be quite a sacrifice.
Your proposed git subtree split -P Some/Sub/Dir 43b3eb7^.. --ignore-joins …
(where 43b3eb7 is the synthetic commit that resulted from git subtree add --squash …
), would generate an unrelated history (except it needs to be 43b3eb7..
since 43b3eb7^
means “the first parent of 43b3eb7” and 43b3eb7 has no parents). I am not sure that git subtree split
was designed to take ranges like this though. The documentation for git subtree split
just says <commit>
, but never really mentions its purpose. Reading the code shows that it defaults to HEAD, which might indicate that it is intended to be a single commit specifying the “tip” of the history that should be processed for splitting. Also, turning on the debug output shows a message incorrect order:
which might indicate that using a range argument is putting the split operation in an unexpected situation (it is expecting to have processed all of the parents of a commit before processing the commit itself, but the range ensures that 43b3eb7 (which is the parent of the subtree merge commit) is never processed). I think you can just use --ignore-splits
and leave off the range if you want to generate “unrelated” history and try to use it in some way: git subtree split -P Some/Sub/Dir --ignore-joins …
.
2
They are not actually dangling immediately after git subtree add --squash
because they are still referenced by FETCH_HEAD. Once an unrelated fetch is done, however, they will become truly dangling.