I have a situation with a relatively large git repository located in a virtual machine on an elderly, slow host on my local network where it takes quite a while to do the initia
The git clone --depth=1 ...
suggested in 2014 will become faster in Q2 2019 with Git 2.22.
That is because, during an initial "git clone --depth=...
" partial clone, it is
pointless to spend cycles for a large portion of the connectivity
check that enumerates and skips promisor objects (which by definition is all objects fetched from the other side).
This has been optimized out.
clone
: do faster object check for partial clonesFor partial clones, doing a full connectivity check is wasteful; we skip promisor objects (which, for a partial clone, is all known objects), and enumerating them all to exclude them from the connectivity check can take a significant amount of time on large repos.
At most, we want to make sure that we get the objects referred to by any wanted refs.
For partial clones, just check that these objects were transferred.
Result:
Test dfa33a2^ dfa33a2
-------------------------------------------------------------------------
5600.2: clone without blobs 18.41(22.72+1.09) 6.83(11.65+0.50) -62.9%
5600.3: checkout of result 1.82(3.24+0.26) 1.84(3.24+0.26) +1.1%
62% faster!
With Git 2.26 (Q1 2020), an unneeded connectivity check is now disabled in a partial clone when fetching into it.
See commit 2df1aa2, commit 5003377 (12 Jan 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 8fb3945, 14 Feb 2020)
connected: verify promisor-ness of partial clone
Signed-off-by: Jonathan Tan
Reviewed-by: Jonathan NiederCommit dfa33a298d ("
clone
: do faster object check for partial clones", 2019-04-21, Git v2.22.0-rc0 -- merge) optimized the connectivity check done when cloning with--filter
to check only the existence of objects directly pointed to by refs.
But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are promisor objects, that is, they appear in a promisor pack.
And:
fetch: forgo full connectivity check if
--filter
Signed-off-by: Jonathan Tan
Reviewed-by: Jonathan NiederIf a filter is specified, we do not need a full connectivity check on the contents of the packfile we just fetched; we only need to check that the objects referenced are promisor objects.
This significantly speeds up fetches into repositories that have many promisor objects, because during the connectivity check, all promisor objects are enumerated (to mark them UNINTERESTING), and that takes a significant amount of time.
And, still with Git 2.26 (Q1 2020), The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.
There however are some cases where they can work together, and they were taught about them.
See commit 20a5fd8 (18 Feb 2020) by Junio C Hamano (gitster).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06 (14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b (13 Feb 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 0df82d9, 02 Mar 2020)
pack-bitmap: implement
BLOB_LIMIT
filteringSigned-off-by: Jeff King
Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT
filters by looking at the sizes of any blobs in the result and unsetting their bits as appropriate.
This is slightly more expensive thanBLOB_NONE,
but still produces a noticeable speedup (these results are on git.git):Test HEAD~2 HEAD ------------------------------------------------------------------------------------ 5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8% 5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
The implementation is similar to the
BLOB_NONE
one, with the exception that we have to go object-by-object while walking the blob-type bitmap (since we can't mask out the matches, but must look up the size individually for each blob).
The trick with usingctz64()
is taken fromshow_objects_for_type()
, which likewise needs to find individual bits (but wants to quickly skip over big chunks without blobs).
Git 2.27 (Q2 2020) will simplify the commit ancestry connectedness check in a partial clone repository in which "promised" objects are assumed to be obtainable lazily on-demand from promisor remote repositories.
See commit 2b98478 (20 Mar 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 0c60105, 22 Apr 2020)
connected: always use partial clone optimization
Signed-off-by: Jonathan Tan
Reviewed-by: Josh SteadmonWith 50033772d5 ("
connected
: verify promisor-ness of partial clone", 2020-01-30, Git v2.26.0-rc0 -- merge listed in batch #5), the fast path (checking promisor packs) incheck_connected()
now passes a subset of the slow path (rev-list) - if all objects to be checked are found in promisor packs, both the fast path and the slow path will pass; otherwise, the fast path will definitely not pass.This means that we can always attempt the fast path whenever we need to do the slow path.
The fast path is currently guarded by a flag; therefore, remove that flag.
Also, make the fast path fallback to the slow path - if the fast path fails, the failing OID and all remaining OIDs will be passed to rev-list.The main user-visible benefit is the performance of fetch from a partial clone - specifically, the speedup of the connectivity check done before the fetch.
In particular, a no-op fetch into a partial clone on my computer was sped up from 7 seconds to 0.01 seconds. This is a complement to the work in 2df1aa239c ("fetch
: forgo full connectivity check if --filter", 2020-01-30, Git v2.26.0-rc0 -- merge listed in batch #5), which is the child of the aforementioned 50033772d5. In that commit, the connectivity check after the fetch was sped up.The addition of the fast path might cause performance reductions in these cases:
If a partial clone or a fetch into a partial clone fails, Git will fruitlessly run
rev-list
(it is expected that everything fetched would go into promisor packs, so if that didn't happen, it is most likely that rev-list will fail too).Any connectivity checks done by receive-pack, in the (in my opinion, unlikely) event that a partial clone serves receive-pack.
I think that these cases are rare enough, and the performance reduction in this case minor enough (additional object DB access), that the benefit of avoiding a flag outweighs these.
With Git 2.27 (Q2 2020), the object walk with object filter "--filter=tree:0
" can now take advantage of the pack bitmap when available.
See commit 9639474, commit 5bf7f1e (04 May 2020) by Jeff King (peff).
See commit b0a8d48, commit 856e12c (04 May 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 69ae8ff, 13 May 2020)
pack-bitmap.c: support 'tree:0' filtering
Signed-off-by: Taylor Blau
In the previous patch, we made it easy to define other filters that exclude all objects of a certain type. Use that in order to implement bitmap-level filtering for the '
--filter=tree:
' filter when 'n
' is equal to0
.The general case is not helped by bitmaps, since for values of '
n > 0
', the object filtering machinery requires a full-blown tree traversal in order to determine the depth of a given tree.
Caching this is non-obvious, too, since the same tree object can have a different depth depending on the context (e.g., a tree was moved up in the directory hierarchy between two commits).But, the '
n = 0
' case can be helped, and this patch does so.
Runningp5310.11
in this tree and on master with the kernel, we can see that this case is helped substantially:Test master this tree -------------------------------------------------------------------------------- 5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4%
And:
See commit 9639474, commit 5bf7f1e (04 May 2020) by Jeff King (peff).
See commit b0a8d48, commit 856e12c (04 May 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 69ae8ff, 13 May 2020)
pack-bitmap: pass object filter to fill-in traversal
Signed-off-by: Jeff King
Signed-off-by: Taylor BlauSometimes a bitmap traversal still has to walk some commits manually, because those commits aren't included in the bitmap packfile (e.g., due to a push or commit since the last full repack).
If we're given an object filter, we don't pass it down to this traversal.
It's not necessary for correctness because the bitmap code has its own filters to post-process the bitmap result (which it must, to filter out the objects that are mentioned in the bitmapped packfile).And with blob filters, there was no performance reason to pass along those filters, either. The fill-in traversal could omit them from the result, but it wouldn't save us any time to do so, since we'd still have to walk each tree entry to see if it's a blob or not.
But now that we support tree filters, there's opportunity for savings. A
tree:depth=0
filter means we can avoid accessing trees entirely, since we know we won't them (or any of the subtrees or blobs they point to).
The new test inp5310
shows this off (the "partial bitmap" state is one whereHEAD~100
and its ancestors are all in a bitmapped pack, butHEAD~100..HEAD
are not).Here are the results (run against
linux.git
):Test HEAD^ HEAD ------------------------------------------------------------------------------------------------- [...] 5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2%
The absolute number of savings isn't huge, but keep in mind that we only omitted 100 first-parent links (in the version of
linux.git
here, that's 894 actual commits).In a more pathological case, we might have a much larger proportion of non-bitmapped commits. I didn't bother creating such a case in the perf script because the setup is expensive, and this is plenty to show the savings as a percentage.