Child git repository as subset of a main repository

前端 未结 3 1545
太阳男子 2021-02-08 12:03

I\'m looking for a way to set up git respositories that include subsets of files from a larger repository, and inherit the history from that main repository. My primary motivat

  •  失恋的感觉
    2021-02-08 12:44

    As I understand your question

    • you have one big repo containing multiple subprojects
    • you want to extract and share each subproject as its own repository, still containing the history/commits for (only) that subproject
    • the subprojects share some files => this implies that the files used by one subproject are not strictly contained in a single subdirectory since one file may be used in multiple subprojects and this is why you can't simply use git subtree or git submodules

    One way to extract the history of just a subset of the files into a dedicated branch (which you then can push into a dedicated repository) is using git filter-branch:

    # regex to match the files included in this subproject, used below
    git checkout -b subproject1 # create new branch from current HEAD
    git filter-branch --prune-empty \
      --index-filter "git ls-files --cached | grep -v -E '$file_list_regex' | xargs -r git rm --cached" \

    This will

    • first create a new branch subproject1 based on the current HEAD (git checkout -b subproject1)
    • traverse its whole history (git filter-branch [...] HEAD)
    • remove all files (xargs -r git rm --cached) that are not part of the subproject (git ls-files --cached | grep -v -E '$file_list_regex')
    • All commits that did not touch one of the subproject files will be dropped from that branch (--prune-empty).
    • This operation does not checkout each revision but operates only on the index (--index-filter/--cached).

    This is a one-time operation though but as I understand your question you want to continously update the extracted subproject repositories/branches with new commit. The good news is you could simply repeat this command since git filter-branch will always produce the same commits/history for your subproject branches - given that you don't manually alter them or rewrite your master branch.

    The drawback of this is that this would filter-branch the complete history each time and for each subproject again and again. Given that you only want to add the last 5 commits of the master branch to the tip of your existing subproject1 branch you could adapt the commands like this:

    # get the full commit ids for the commits we consider
    # to be equivalent in master and subproject1 branch
    common_base_commit="$(git rev-parse master~6)"
    subproject_tip="$(git rev-parse subproject1)"
    # checkout a detached HEAD so we don't change the master branch
    git checkout --detach master
    git filter-branch --prune-empty \
      --index-filter "git ls-files --cached | grep -v -E '$file_list_regex' | xargs -r git rm --cached" \
      --parent-filter "sed s/${common_base_commit}/${subproject_tip}/g" \
    # force reset subproject1 branch to current HEAD
    git branch -f subproject1


    • This will only rewrite the last 5 commits (git filter-branch [...] ${common_base_commit}..HEAD) up to master~6 which we consider to be the equivalent commit to subproject1s current tip.
    • For (the first of) those commits it will rewrite its parent from master~6 to subproject1 (--parent-filter 'sed s/${common_base_commit}/${subproject_tip}/g') effectively rebasing the 5 rewritten commits on top of subproject1.
    • Finally we only need to update subproject1 to include the new commits on top of it.

    Further optimazation/automation:

    • implement a better logic to list the files you want to include ($file_list_regex) or actually to exclude (git ls-files --cached | grep -v -E '$file_list_regex') from a given subproject
    • make the list of files to include depend on the current commit ($GIT_COMMIT) or check-in the list to the repository itself in case the files to include per subproject may change over time
    • find an automated way to find the 'equivalent' commit of a subproject branches tip in the current master
    • combine all of it in a nice git alias so you can simply use git update-project subproject1
