Git submodule init async

后端 未结 4 2323
清酒与你
清酒与你 2021-02-19 16:02

When I run git submodule update --init first time on a projects which have a lot of submodules, this usually take a lot of time, because most of submodules are stor

4条回答
  •  遥遥无期
    2021-02-19 16:52

    This can also be done in Python. In Python 3 (because we're in 2015...), we can use something like this:

    #!/usr/bin/env python3
    
    import os
    import re
    import subprocess
    import sys
    from functools import partial
    from multiprocessing import Pool
    
    def list_submodules(path):
        gitmodules = open(os.path.join(path, ".gitmodules"), 'r')
        matches = re.findall("path = ([\w\-_\/]+)", gitmodules.read())
        gitmodules.close()
        return matches
    
    
    def update_submodule(name, path):
        cmd = ["git", "-C", path, "submodule", "update", "--init", name]
        return subprocess.call(cmd, shell=False)
    
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            sys.exit(2)
        root_path = sys.argv[1]
    
        p = Pool()
        p.map(partial(update_submodule, path=root_path), list_submodules(root_path))
    

    This may be safer than the one-liner given by @Karmazzin (since that one just keeps spawning processes without any control on the number of processes spawned), still it follows the same logic: read .gitmodules, then spawn multiple processes running the proper git command, but here using a process pool (the maximum number of processes can be set too). The path to the cloned repository needs to be passed as an argument. This was tested extensively on a repository with around 700 submodules.

    Note that in the case of a submodule initialization, each process will try to write to .git/config, and locking issues may happen:

    error: could not lock config file .git/config: File exists

    Failed to register url for submodule path '...'

    This can be caught with subprocess.check_output and a try/except subprocess.CalledProcessError block, which is cleaner than the sleep added to @Karmazzin's method. An updated method could look like:

    def update_submodule(name, path):
        cmd = ["git", "-C", path, "submodule", "update", "--init", name]
        while True:
            try:
                subprocess.check_output(cmd, stderr=subprocess.PIPE, shell=False)
                return
            except subprocess.CalledProcessError as e:
                if b"could not lock config file .git/config: File exists" in e.stderr:
                    continue
                else:
                    raise e
    

    With this, I managed to run the init/update of 700 submodules during a Travis build without the need to limit the size of the process pool. I often see a few locks caught that way (~3 max).

提交回复
热议问题