Limiting file size in git repository

前端 未结 11 1451
星月不相逢
星月不相逢 2020-12-02 14:55

I\'m currently thinking of changing my VCS (from subversion) to git. Is it possible to limit the file size within a commit in a git repository? For e. g. subversion there is

相关标签:
11条回答
  • 2020-12-02 15:17

    I want to highlight another set of approaches that address this issue at the pull request stage: GitHub Actions and Apps. It doesn't stop large files from being committed into a branch, but if they're removed prior to the merge then the resulting base branch will not have the large files in history.

    There's a recently developed action that checks the added file sizes (through the GitHub API) against a user-defined reference value: lfs-warning.

    I've also personally hacked together a Probot app to screen for large file sizes in a PR (against a user-defined value), but it's much less efficient: sizeCheck

    0 讨论(0)
  • 2020-12-02 15:19

    This one is pretty good:

    #!/bin/bash -u
    #
    # git-max-filesize
    #
    # git pre-receive hook to reject large files that should be commited
    # via git-lfs (large file support) instead.
    #
    # Author: Christoph Hack <chack@mgit.at>
    # Copyright (c) 2017 mgIT GmbH. All rights reserved.
    # Distributed under the Apache License. See LICENSE for details.
    #
    set -o pipefail
    
    readonly DEFAULT_MAXSIZE="5242880" # 5MB
    readonly CONFIG_NAME="hooks.maxfilesize"
    readonly NULLSHA="0000000000000000000000000000000000000000"
    readonly EXIT_SUCCESS="0"
    readonly EXIT_FAILURE="1"
    
    # main entry point
    function main() {
      local status="$EXIT_SUCCESS"
    
      # get maximum filesize (from repository-specific config)
      local maxsize
      maxsize="$(get_maxsize)"
      if [[ "$?" != 0 ]]; then
        echo "failed to get ${CONFIG_NAME} from config"
        exit "$EXIT_FAILURE"
      fi
    
      # skip this hook entirely if maxsize is 0.
      if [[ "$maxsize" == 0 ]]; then
        cat > /dev/null
        exit "$EXIT_SUCCESS"
      fi
    
      # read lines from stdin (format: "<oldref> <newref> <refname>\n")
      local oldref
      local newref
      local refname
      while read oldref newref refname; do
        # skip branch deletions
        if [[ "$newref" == "$NULLSHA" ]]; then
          continue
        fi
    
        # find large objects
        # check all objects from $oldref (possible $NULLSHA) to $newref, but
        # skip all objects that have already been accepted (i.e. are referenced by
        # another branch or tag).
        local target
        if [[ "$oldref" == "$NULLSHA" ]]; then
          target="$newref"
        else
          target="${oldref}..${newref}"
        fi
        local large_files
        large_files="$(git rev-list --objects "$target" --not --branches=\* --tags=\* | \
          git cat-file $'--batch-check=%(objectname)\t%(objecttype)\t%(objectsize)\t%(rest)' | \
          awk -F '\t' -v maxbytes="$maxsize" '$3 > maxbytes' | cut -f 4-)"
        if [[ "$?" != 0 ]]; then
          echo "failed to check for large files in ref ${refname}"
          continue
        fi
    
        IFS=$'\n'
        for file in $large_files; do
          if [[ "$status" == 0 ]]; then
            echo ""
            echo "-------------------------------------------------------------------------"
            echo "Your push was rejected because it contains files larger than $(numfmt --to=iec "$maxsize")."
            echo "Please use https://git-lfs.github.com/ to store larger files."
            echo "-------------------------------------------------------------------------"
            echo ""
            echo "Offending files:"
            status="$EXIT_FAILURE"
          fi
          echo " - ${file} (ref: ${refname})"
        done
        unset IFS
      done
    
      exit "$status"
    }
    
    # get the maximum filesize configured for this repository or the default
    # value if no specific option has been set. Suffixes like 5k, 5m, 5g, etc.
    # can be used (see git config --int).
    function get_maxsize() {
      local value;
      value="$(git config --int "$CONFIG_NAME")"
      if [[ "$?" != 0 ]] || [[ -z "$value" ]]; then
        echo "$DEFAULT_MAXSIZE"
        return "$EXIT_SUCCESS"
      fi
      echo "$value"
      return "$EXIT_SUCCESS"
    }
    
    main
    

    You can configure the size in the serverside config file by adding:

    [hooks]
            maxfilesize = 1048576 # 1 MiB
    
    0 讨论(0)
  • 2020-12-02 15:22

    The answers by eis and J-16 SDiZ suffer from a severe problem. They are only checking the state of the finale commit $3 or $newrev. They need to also check what is being submitted in the other commits between $2 (or $oldrev) and $3 (or $newrev) in the udpate hook.

    J-16 SDiZ is closer to the right answer.

    The big flaw is that someone whose departmental server has this update hook installed to protect it will find out the hard way that:

    After using git rm to remove the big file accidentally being checked in, then the current tree or last commit only will be fine, and it will pull in the entire chain of commits, including the big file that was deleted, creating a swollen unhappy fat history that nobody wants.

    To solution is either to check each and every commit from $oldrev to $newrev, or to specify the entire range $oldrev..$newrev. Be darn sure you are not just checking $newrev alone, or this will fail with massive junk in your git history, pushed out to share with others, and then difficult or impossible to remove after that.

    0 讨论(0)
  • 2020-12-02 15:24

    Another way is to version a .gitignore, which will prevent any file with a certain extension to show up in the status.
    You still can have hooks as well (on downstream or upstream, as suggested by the other answers), but at least all downstream repo can include that .gitignore to avoid adding .exe, .dll, .iso, ...

    0 讨论(0)
  • 2020-12-02 15:24

    You need a solution that caters to the following scenarios.

    1. If someone is pushing multiple commits together, then the hook should check ALL the commits (between oldref and newref) in that push for files greater than a certain limit
    2. The hook should run for all users. If you write a client side hook, it will not be available for all users since such hooks are not pushed when you do a git push. So, what is needed is a server side hook such as a pre-receive hook.

    This hook (https://github.com/mgit-at/git-max-filesize) deals with the above 2 cases and seems to also correctly handle edge cases such as new branch pushes and branch deletes.

    0 讨论(0)
提交回复
热议问题