I'm hosting a git repo on a shared host. My repo necessarily has a couple of very large files in it, and every time I try to run "git gc" on the repo now, my process gets killed by the shared hosting provider for using too much memory. Is there a way to limit the amount of memory that git gc can consume? My hope would be that it can trade memory usage for speed and just take a little longer to do its work.
Yes, have a look at the help page for git config
and look at the pack.*
options, specifically pack.depth
, pack.window
, pack.windowMemory
and pack.deltaCacheSize
.
It's not a totally exact size as git needs to map each object into memory so one very large object can cause a lot of memory usage regardless of the window and delta cache settings.
You may have better luck packing locally and transfering pack files to the remote side "manually", adding a .keep
files so that the remote git doesn't ever try to completely repack everything.
I used instructions from this link. Same idea as Charles Baileys suggested.
A copy of the commands is here:
git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"
This worked for me on hostgator with shared hosting account.
Git repack's memory use is: (pack.deltaCacheSize + pack.windowMemory) × pack.threads
. Respective defaults are 256MiB, unlimited, nproc.
The delta cache isn't useful: most of the time is spent computing deltas on a sliding window, the majority of which are discarded; caching the survivors so they can be reused once (when writing) won't improve the runtime. That cache also isn't shared between threads.
By default the window memory is limited through pack.window
(gc.aggressiveWindow
). Limiting packing that way is a bad idea, because the working set size and efficiency will vary widely. It's best to raise both to much higher values and rely on pack.windowMemory
to limit the window size.
Finally, threading has the disadvantage of splitting the working set. Lowering pack.threads
and increasing pack.windowMemory
so that the total stays the same should improve the run time.
repack has other useful tunables (pack.depth
, pack.compression
, the bitmap options), but they don't affect memory use.
You could use turn off the delta attribute to disable delta compression for just the blobs of those pathnames:
In foo/.git/info/attributes
(or foo.git/info/attributes
if it is a bare repository) (see the delta entry in gitattributes and see gitignore for the pattern syntax):
/large_file_dir/* -delta
*.psd -delta
/data/*.iso -delta
/some/big/file -delta
another/file/that/is/large -delta
This will not affect clones of the repository. To affect other repositories (i.e. clones), put the attributes in a .gitattributes
file instead of (or in addition to) the info/attributes
file.
Git 2.18 (Q2 2018) will improve the gc memory consumption.
Before 2.18, "git pack-objects
" needs to allocate tons of "struct object_entry
" while doing its work: shrinking its size helps the performance
quite a bit.
This influences git gc
.
See commit f6a5576, commit 3b13a5f, commit 0aca34e, commit ac77d0c, commit 27a7d06, commit 660b373, commit 0cb3c14, commit 898eba5, commit 43fa44f, commit 06af3bb, commit b5c0cbd, commit 0c6804a, commit fd9b1ba, commit 8d6ccce, commit 4c2db93 (14 Apr 2018) by Nguyễn Thái Ngọc Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit ad635e8, 23 May 2018)
pack-objects
: reorder members to shrinkstruct object_entry
Previous patches leave lots of holes and padding in this struct.
This patch reorders the members and shrinks the struct down to 80 bytes (from 136 bytes on 64-bit systems, before any field shrinking is done) with 16 bits to spare (and a couple more in in_pack_header_size when we really run out of bits).This is the last in a series of memory reduction patches (see "pack-objects: a bit of document about struct object_entry" for the first one).
Overall they've reduced repack memory size on
linux-2.6.git
from 3.747G to 3.424G, or by around 320M, a decrease of 8.5%.
The runtime of repack has stayed the same throughout this series.
Ævar's testing on a big monorepo he has access to (bigger thanlinux-2.6.git
) has shown a 7.9% reduction, so the overall expected improvement should be somewhere around 8%.
With Git 2.20 (Q4 2018), it will be easier to check an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository.
See commit fe0ac2f, commit 108f530, commit f64ba53 (16 Aug 2018) by Christian Couder (chriscool
).
Helped-by: Jeff King (peff
), and Duy Nguyen (pclouds
).
See commit 9eb0986, commit 16d75fa, commit 28b8a73, commit c8d521f (16 Aug 2018) by Jeff King (peff
).
Helped-by: Jeff King (peff
), and Duy Nguyen (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit f3504ea, 17 Sep 2018)
pack-objects
: move 'layer
' into 'struct packing_data
'This reduces the size of 'struct object_entry' from 88 bytes to 80 and therefore makes packing objects more efficient.
For example on a Linux repo with 12M objects,
git pack-objects --all
needs extra 96MB memory even if the layer feature is not used.
Note that Git 2.21 (Feb. 2019) fixes a small bug: "git pack-objects" incorrectly used uninitialized mutex, which has been corrected.
See commit edb673c, commit 459307b (25 Jan 2019) by Patrick Hogg (``).
Helped-by: Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
-- in commit d243a32, 05 Feb 2019)
pack-objects
: move read mutex topacking_data
structac77d0c ("
pack-objects
: shrink size field in structobject_entry
", 2018-04-14) added an extra usage of read_lock/read_unlock in the newly introducedoe_get_size_slow
for thread safety in parallel calls totry_delta()
.
Unfortunatelyoe_get_size_slow
is also used in serial code, some of which is called before the first invocation ofll_find_deltas
.
As such the read mutex is not guaranteed to be initialized.Resolve this by moving the read mutex to
packing_data
and initializing it in prepare_packing_data which is initialized incmd_pack_objects
.
Git 2.21 (Feb. 2019) still find another way to shring the size of the pack with "git pack-objects
" learning another algorithm to compute the set of
objects to send, that trades the resulting packfile off to save
traversal cost to favor small pushes.
pack-objects
: createpack.useSparse
settingThe '
--sparse
' flag in 'git pack-objects
' changes the algorithm used to enumerate objects to one that is faster for individual users pushing new objects that change only a small cone of the working directory.
The sparse algorithm is not recommended for a server, which likely sends new objects that appear across the entire working directory.Create a '
pack.useSparse
' setting that enables this new algorithm.
This allows 'git push
' to use this algorithm without passing a '--sparse
' flag all the way through four levels ofrun_command()
calls.If the '
--no-sparse
' flag is set, then this config setting is overridden.
The config pack documentation now includes:
pack.useSparse:
When true, Git will default to using the '
--sparse
' option in 'git pack-objects
' when the '--revs
' option is present.
This algorithm only walks trees that appear in paths that introduce new objects.This can have significant performance benefits when computing a pack to send a small change.
However, it is possible that extra objects are added to the pack-file if the included commits contain certain types of direct renames.
See "git push
is very slow for a huge repo" for a concrete illustration.
来源:https://stackoverflow.com/questions/3095737/is-there-a-way-to-limit-the-amount-of-memory-that-git-gc-uses