问题
I have a shallow cloned git repository that is over 1 GB. I use sparse checkout for the files/dirs needed.
How can I reduce the repository clone to just the sparse checkout files/dirs?
Initially I was able to limit the cloned repository to only the sparse checkout by disabling checkout when cloning. Then setting up sparse checkout before doing the initial checkout. This limited the repository to only about 200 MB. Much more manageable. However updating remote branch info at some point in the future causes the rest of the files and dirs to be included in the repository clone. Sending the repo clone size back to over 1 GB and I don't know how to just the sparse checkout files and dirs.
In short what I want is a shallow AND sparse repository clone. Not just sparse checkout of a shallow repo clone. The full repo is a waste of space and performance for certain tasks suffers.
Hope someone can share a solution. Thanks.
回答1:
Shallow and sparse means "partial" or "narrow".
A partial clone (or "narrow clone") is in theory possible, and was implemented first in Dec 2017 with Git 2.16, as seen here.
But:
- only with Git 2.18 could you do such a partial clone: see here for a test example.
- only with a server supporting a transport protocol V2, and Git 2.19: that would ensure that only the minimal amount of data is indeed transferred.
That is further optimized in Git 2.20 (Q4 2018), since in a partial clone that will lazily be hydrated from the originating repository, we generally want to avoid "does this object exist (locally)?" on objects that we deliberately omitted
when we created the (partial/sparse) clone.
The cache-tree codepath (which is used to write a tree object out of the index) however insisted that the object exists, even for paths that are outside of the partial checkout area.
The code has been updated to avoid such a check.
See commit 2f215ff (09 Oct 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit a08b1d6, 19 Oct 2018)
cache-tree
: skip some blob checks in partial cloneIn a partial clone, whenever a sparse checkout occurs, the existence of all blobs in the index is verified, whether they are included or excluded by the
.git/info/sparse-checkout
specification.
This significantly degrades performance because a lazy fetch occurs whenever the existence of a missing blob is checked.
With Git 2.24 (Q4 2019), the cache-tree
code has been taught to be less aggressive in attempting to see if a tree object it computed already exists in
the repository.
See commit f981ec1 (03 Sep 2019) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit ae203ba, 07 Oct 2019)
cache-tree
: do not lazy-fetch tentative treeThe
cache-tree
datastructure is used to speed up the comparison between the HEAD and the index, and when the index is updated by a cherry-pick (for example), a tree object that would represent the paths in the index in a directory is constructed in-core, to see if such a tree object exists already in the object store.When the lazy-fetch mechanism was introduced, we converted this "does the tree exist?" check into an "if it does not, and if we lazily cloned, see if the remote has it" call by mistake.
Since the whole point of this check is to repair the cache-tree by recording an already existing tree object opportunistically, we shouldn't even try to fetch one from the remote.Pass the
OBJECT_INFO_SKIP_FETCH_OBJECT
flag to make sure we only check for existence in the local object store without triggering the lazy fetch mechanism.
来源:https://stackoverflow.com/questions/52526540/shallow-and-sparse-git-repository-clone