How to sparsely checkout only one single file from a git repository?

前端 未结 21 1620
Happy的楠姐
Happy的楠姐 2020-11-22 08:14

How do I checkout just one file from a git repo?

21条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-22 08:53

    git clone --filter from Git 2.19

    This option will actually skip fetching most unneeded objects from the server:

    git clone --depth 1 --no-checkout --filter=blob:none \
      "file://$(pwd)/server_repo" local_repo
    cd local_repo
    git checkout master -- mydir/myfile
    

    The server should be configured with:

    git config --local uploadpack.allowfilter 1
    git config --local uploadpack.allowanysha1inwant 1
    

    There is no server support as of v2.19.0, but it can already be locally tested.

    TODO: --filter=blob:none skips all blobs, but still fetches all tree objects. But on a normal repo, this should be tiny compared to the files themselves, so this is already good enough. Asked at: https://www.spinics.net/lists/git/msg342006.html Devs replied a --filter=tree:0 is in the works to do that.

    Remember that --depth 1 already implies --single-branch, see also: How do I clone a single branch in Git?

    file://$(path) is required to overcome git clone protocol shenanigans: How to shallow clone a local git repository with a relative path?

    The format of --filter is documented on man git-rev-list.

    An extension was made to the Git remote protocol to support this feature.

    Docs on Git tree:

    • https://github.com/git/git/blob/v2.19.0/Documentation/technical/partial-clone.txt
    • https://github.com/git/git/blob/v2.19.0/Documentation/rev-list-options.txt#L720
    • https://github.com/git/git/blob/v2.19.0/t/t5616-partial-clone.sh

    Test it out

    #!/usr/bin/env bash
    set -eu
    
    list-objects() (
      git rev-list --all --objects
      echo "master commit SHA: $(git log -1 --format="%H")"
      echo "mybranch commit SHA: $(git log -1 --format="%H")"
      git ls-tree master
      git ls-tree mybranch | grep mybranch
      git ls-tree master~ | grep root
    )
    
    # Reproducibility.
    export GIT_COMMITTER_NAME='a'
    export GIT_COMMITTER_EMAIL='a'
    export GIT_AUTHOR_NAME='a'
    export GIT_AUTHOR_EMAIL='a'
    export GIT_COMMITTER_DATE='2000-01-01T00:00:00+0000'
    export GIT_AUTHOR_DATE='2000-01-01T00:00:00+0000'
    
    rm -rf server_repo local_repo
    mkdir server_repo
    cd server_repo
    
    # Create repo.
    git init --quiet
    git config --local uploadpack.allowfilter 1
    git config --local uploadpack.allowanysha1inwant 1
    
    # First commit.
    # Directories present in all branches.
    mkdir d1 d2
    printf 'd1/a' > ./d1/a
    printf 'd1/b' > ./d1/b
    printf 'd2/a' > ./d2/a
    printf 'd2/b' > ./d2/b
    # Present only in root.
    mkdir 'root'
    printf 'root' > ./root/root
    git add .
    git commit -m 'root' --quiet
    
    # Second commit only on master.
    git rm --quiet -r ./root
    mkdir 'master'
    printf 'master' > ./master/master
    git add .
    git commit -m 'master commit' --quiet
    
    # Second commit only on mybranch.
    git checkout -b mybranch --quiet master~
    git rm --quiet -r ./root
    mkdir 'mybranch'
    printf 'mybranch' > ./mybranch/mybranch
    git add .
    git commit -m 'mybranch commit' --quiet
    
    echo "# List and identify all objects"
    list-objects
    echo
    
    # Restore master.
    git checkout --quiet master
    cd ..
    
    # Clone. Don't checkout for now, only .git/ dir.
    git clone --depth 1 --quiet --no-checkout --filter=blob:none "file://$(pwd)/server_repo" local_repo
    cd local_repo
    
    # List missing objects from master.
    echo "# Missing objects after --no-checkout"
    git rev-list --all --quiet --objects --missing=print
    echo
    
    echo "# Git checkout fails without internet"
    mv ../server_repo ../server_repo.off
    ! git checkout master
    echo
    
    echo "# Git checkout fetches the missing file from internet"
    mv ../server_repo.off ../server_repo
    git checkout master -- d1/a
    echo
    
    echo "# Missing objects after checking out d1/a"
    git rev-list --all --quiet --objects --missing=print
    

    GitHub upstream.

    Output in Git v2.19.0:

    # List and identify all objects
    c6fcdfaf2b1462f809aecdad83a186eeec00f9c1
    fc5e97944480982cfc180a6d6634699921ee63ec
    7251a83be9a03161acde7b71a8fda9be19f47128
    62d67bce3c672fe2b9065f372726a11e57bade7e
    b64bf435a3e54c5208a1b70b7bcb0fc627463a75 d1
    308150e8fffffde043f3dbbb8573abb6af1df96e63 d1/a
    f70a17f51b7b30fec48a32e4f19ac15e261fd1a4 d1/b
    84de03c312dc741d0f2a66df7b2f168d823e122a d2
    0975df9b39e23c15f63db194df7f45c76528bccb d2/a
    41484c13520fcbb6e7243a26fdb1fc9405c08520 d2/b
    7d5230379e4652f1b1da7ed1e78e0b8253e03ba3 master
    8b25206ff90e9432f6f1a8600f87a7bd695a24af master/master
    ef29f15c9a7c5417944cc09711b6a9ee51b01d89
    19f7a4ca4a038aff89d803f017f76d2b66063043 mybranch
    1b671b190e293aa091239b8b5e8c149411d00523 mybranch/mybranch
    c3760bb1a0ece87cdbaf9a563c77a45e30a4e30e
    a0234da53ec608b54813b4271fbf00ba5318b99f root
    93ca1422a8da0a9effc465eccbcb17e23015542d root/root
    master commit SHA: fc5e97944480982cfc180a6d6634699921ee63ec
    mybranch commit SHA: fc5e97944480982cfc180a6d6634699921ee63ec
    040000 tree b64bf435a3e54c5208a1b70b7bcb0fc627463a75    d1
    040000 tree 84de03c312dc741d0f2a66df7b2f168d823e122a    d2
    040000 tree 7d5230379e4652f1b1da7ed1e78e0b8253e03ba3    master
    040000 tree 19f7a4ca4a038aff89d803f017f76d2b66063043    mybranch
    040000 tree a0234da53ec608b54813b4271fbf00ba5318b99f    root
    
    # Missing objects after --no-checkout
    ?f70a17f51b7b30fec48a32e4f19ac15e261fd1a4
    ?8b25206ff90e9432f6f1a8600f87a7bd695a24af
    ?41484c13520fcbb6e7243a26fdb1fc9405c08520
    ?0975df9b39e23c15f63db194df7f45c76528bccb
    ?308150e8fffffde043f3dbbb8573abb6af1df96e63
    
    # Git checkout fails without internet
    fatal: '/home/ciro/bak/git/test-git-web-interface/other-test-repos/partial-clone.tmp/server_repo' does not appear to be a git repository
    fatal: Could not read from remote repository.
    
    Please make sure you have the correct access rights
    and the repository exists.
    
    # Git checkout fetches the missing directory from internet
    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 1 (delta 0), reused 0 (delta 0)
    Receiving objects: 100% (1/1), 45 bytes | 45.00 KiB/s, done.
    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 1 (delta 0), reused 0 (delta 0)
    Receiving objects: 100% (1/1), 45 bytes | 45.00 KiB/s, done.
    
    # Missing objects after checking out d1
    ?f70a17f51b7b30fec48a32e4f19ac15e261fd1a4
    ?8b25206ff90e9432f6f1a8600f87a7bd695a24af
    ?41484c13520fcbb6e7243a26fdb1fc9405c08520
    ?0975df9b39e23c15f63db194df7f45c76528bccb
    

    Conclusions: all blobs except d1/a are missing. E.g. f70a17f51b7b30fec48a32e4f19ac15e261fd1a4, which is d1/b, is not there after checking out d1/.

    Note that root/root and mybranch/mybranch are also missing, but --depth 1 hides that from the list of missing files. If you remove --depth 1, then they show on the list of missing files.

提交回复
热议问题