When I run git submodule update --init
first time on a projects which have a lot of submodules, this usually take a lot of time, because most of submodules are stor
Update January 2016:
With Git 2.8 (Q1 2016), you will be able to fetch submodules in parallel (!) with git fetch --recurse-submodules -j2
.
See "How to speed up / parallelize downloads of git submodules using git clone --recursive?"
Original answer mid-2013
You could try:
to initialize first all submodules:
git submodule init
Then, the foreach syntax:
git submodule foreach git submodule update --recursive -- $path &
If the '&
' applies to the all line (instead of just the 'git submodule update --recursive -- $path
' part), then you could call a script which would make the update in the background.
git submodule foreach git_submodule_update
This can also be done in Python. In Python 3 (because we're in 2015...), we can use something like this:
#!/usr/bin/env python3
import os
import re
import subprocess
import sys
from functools import partial
from multiprocessing import Pool
def list_submodules(path):
gitmodules = open(os.path.join(path, ".gitmodules"), 'r')
matches = re.findall("path = ([\w\-_\/]+)", gitmodules.read())
gitmodules.close()
return matches
def update_submodule(name, path):
cmd = ["git", "-C", path, "submodule", "update", "--init", name]
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
if len(sys.argv) != 2:
sys.exit(2)
root_path = sys.argv[1]
p = Pool()
p.map(partial(update_submodule, path=root_path), list_submodules(root_path))
This may be safer than the one-liner given by @Karmazzin (since that one just keeps spawning processes without any control on the number of processes spawned), still it follows the same logic: read .gitmodules
, then spawn multiple processes running the proper git command, but here using a process pool (the maximum number of processes can be set too). The path to the cloned repository needs to be passed as an argument. This was tested extensively on a repository with around 700 submodules.
Note that in the case of a submodule initialization, each process will try to write to .git/config
, and locking issues may happen:
error: could not lock config file .git/config: File exists
Failed to register url for submodule path '...'
This can be caught with subprocess.check_output
and a try/except subprocess.CalledProcessError
block, which is cleaner than the sleep added to @Karmazzin's method. An updated method could look like:
def update_submodule(name, path):
cmd = ["git", "-C", path, "submodule", "update", "--init", name]
while True:
try:
subprocess.check_output(cmd, stderr=subprocess.PIPE, shell=False)
return
except subprocess.CalledProcessError as e:
if b"could not lock config file .git/config: File exists" in e.stderr:
continue
else:
raise e
With this, I managed to run the init/update of 700 submodules during a Travis build without the need to limit the size of the process pool. I often see a few locks caught that way (~3 max).
As of Git 2.8 you can do this:
git submodule update --init --jobs 4
where 4 is the number of submodules to download in parallel.
Linux:
cat .gitmodules | grep -Po '".*"' | sed 's/.\(.\+\).$/\1/' | while sleep 0.1 && read line; do git submodule update --init $line & done
Mac:
cat .gitmodules | grep -o '".*"' | cut -d '"' -f 2 | while sleep 0.1 && read line; do git submodule update --init $line & done