Ways to improve git status performance

后端 未结 10 811
遇见更好的自我
遇见更好的自我 2020-12-02 06:51

I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status takes 36 minutes and subsequent git status takes 8 minutes. Se

相关标签:
10条回答
  • 2020-12-02 07:05

    The performance of git status should improve with Git 2.13 (Q2 2017).

    See commit 950a234 (14 Apr 2017) by Jeff Hostetler (jeffhostetler).
    (Merged by Junio C Hamano -- gitster -- in commit 8b6bba6, 24 Apr 2017)

    > string-list: use ALLOC_GROW macro when reallocing string_list

    Use ALLOC_GROW() macro when reallocing a string_list array rather than simply increasing it by 32.
    This is a performance optimization.

    During status on a very large repo and there are many changes, a significant percentage of the total run time is spent reallocing the wt_status.changes array.

    This change decreases the time in wt_status_collect_changes_worktree() from 125 seconds to 45 seconds on my very large repository.


    Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.

    See commit ca54d9b (27 Jan 2018) by Nguyễn Thái Ngọc Duy (pclouds).
    (Merged by Junio C Hamano -- gitster -- in commit 090dbea, 15 Feb 2018)

    trace: measure where the time is spent in the index-heavy operations

    All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not.
    An unoptimized git-status would give something like below:

    0.001791141 s: read cache ...
    0.004011363 s: preload index
    0.000516161 s: refresh index
    0.003139257 s: git command: ... 'status' '--porcelain=2'
    0.006788129 s: diff-files
    0.002090267 s: diff-index
    0.001885735 s: initialize name hash
    0.032013138 s: read directory
    0.051781209 s: git command: './git' 'status'
    

    The same Git 2.17 (Q2 2018) improves git status with:

    • commit f39a757, commit 3ca1897, commit fd9b544, commit d7d1b49 (09 Jan 2018) by Jeff Hostetler (jeffhostetler).
      (Merged by Junio C Hamano -- gitster -- in commit 4094e47, 08 Mar 2018)
      "git status" can spend a lot of cycles to compute the relation between the current branch and its upstream, which can now be disabled with "--no-ahead-behind" option.

    • commit ebbed3b (25 Feb 2018) by Derrick Stolee (derrickstolee).

    revision.c: reduce object database queries

    In mark_parents_uninteresting(), we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit.

    Modify the condition to only check has_object_file() if the result would change the parsed bit.

    When a local branch is different from its upstream ref, "git status" will compute ahead/behind counts.
    This uses paint_down_to_common() and hits mark_parents_uninteresting().

    On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master" by ~60,000 commits, we find the performance of "git status" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%.


    Git 2.24 (Q3 2019) proposes another setting to improve git status performance:

    See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee).
    (Merged by Junio C Hamano -- gitster -- in commit f4f8dfe, 09 Sep 2019)

    repo-settings: create feature.manyFiles setting

    The feature.manyFiles setting is suitable for repos with many files in the working directory.
    By setting index.version=4 and core.untrackedCache=true, commands such as 'git status' should improve.

    But:

    With Git 2.24 (Q4 2019), the codepath that reads the index.version configuration was broken with a recent update, which has been corrected.

    See commit c11e996 (23 Oct 2019) by Derrick Stolee (derrickstolee).
    (Merged by Junio C Hamano -- gitster -- in commit 4d6fb2b, 24 Oct 2019)

    repo-settings: read an int for index.version

    Signed-off-by: Derrick Stolee

    Several config options were combined into a repo_settings struct in ds/feature-macros, including a move of the "index.version" config setting in 7211b9e ("repo-settings: consolidate some config settings", 2019-08-13, Git v2.24.0-rc1 -- merge listed in batch #0).

    Unfortunately, that file looked like a lot of boilerplate and what is clearly a factor of copy-paste overload, the config setting is parsed with repo_config_ge_bool() instead of repo_config_get_int(). This means that a setting "index.version=4" would not register correctly and would revert to the default version of 3.

    I caught this while incorporating v2.24.0-rc0 into the VFS for Git codebase, where we really care that the index is in version 4.

    This was not caught by the codebase because the version checks placed in t1600-index.sh did not test the "basic" scenario enough. Here, we modify the test to include these normal settings to not be overridden by features.manyFiles or GIT_INDEX_VERSION.
    While the "default" version is 3, this is demoted to version 2 in do_write_index() when not necessary.

    0 讨论(0)
  • 2020-12-02 07:10

    Try git gc. Also, git clean may help.

    UPDATE - Not sure where the down vote came from, but the git manual specifically states:

    Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.

    Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

    I always notice a difference after running git gc when git status is slow!

    UPDATE II - Not sure how I missed this, but the OP already tried git gc and git clean. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!

    0 讨论(0)
  • 2020-12-02 07:11

    I'm also seeing this problem on a large project shared over NFS.

    It took me some time to discover the flag -uno that can be given to both git commit and git status.

    What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.

    Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.

    0 讨论(0)
  • 2020-12-02 07:11

    Something that hasn't been mentioned yet is, to activate the filesystem cache on windows machines (linux filesystems are completly different and git was optimized for them, therefore this probably only helps on windows).

    git config core.fscache true
    


    As a last resort, if git is still slow, one could turn off the modification time inspection, that git needs to find out which files have changed.

    git config core.ignoreStat true
    

    BUT: Changed files have to be added afterwards by the dev himself with git add. Git doesn't find changes itself.

    source

    0 讨论(0)
  • 2020-12-02 07:12

    git config --global core.preloadIndex true

    Did the job for me. Check the official documentation here.

    0 讨论(0)
  • 2020-12-02 07:23

    In our codebase where we have somewhere in the range of 20 - 30 submodules,
    git status --ignore-submodules
    sped things up for me drastically. Do note that this will not report on the status of submodules.

    0 讨论(0)
提交回复
热议问题