Apply a command to all commits

后端 未结 2 1813
时光说笑
时光说笑 2021-01-20 23:22

In an attempt to gather some statistics about a Git repository, I\'m looking for a way to do the following:

  • For each commit, execute a command (ex; du -h
相关标签:
2条回答
  • 2021-01-20 23:52

    I don't see how you could do this without checking out each commit, so that is going to take a while on a large repository.

    Here's how you could go about it with bash:

    #! /bin/bash
    
    while read co dt ; do
        git checkout $co > /dev/null 2>&1
        size=$(du -hs --exclude=.git|cut -f1)
        echo $co $size $dt
    done < <(git rev-list --pretty=format:"%H %ci" --all --date-order |grep -v "^commit")
    

    Warning: this will leave you in detached head state, on the oldest commit, which is not a nice place to be.

    0 讨论(0)
  • 2021-01-20 23:58

    To compute the size of each commit in the repo, it will be pretty slow to check out each commit. For one thing, you are duplicating a lot of work, since you'll be recomputing sizes of files that are not changing. Also, you will hammer your filesystem constantly checking things out. Here is a script that queries the git repo to get the info you need. The primary benefit is that you never actually look at any of the blobs to compute their size, but just ask git to tell you. Also, you only query git for each blob once (through the magic of Memoize).
    There is no doubt that this script needs work (an autodie to catch any git failures would be a good idea), but it should give you a place to start. (I've modified this from the original posting to include an argument that can be used as a refspec. If called with no argument, this prints info for every commit in history. You can pass a ref-spec as to rev-list to limit the work. For example, if you have tags v0 and v1, you can pass "v0..v1" as the first argument.)

    #!/usr/bin/env perl
    
    use warnings;
    use strict;
    use Memoize;
    
    my $rev_list = $ARGV[ 0 ] || "--all";
    
    # Query git for the size of a blob.  This is memoized, so we only
    # ask for any blob once.
    sub get_blob_size($) {
        my $hash = shift;
        my $size = qx( git cat-file -s $hash );
        return int( $size );
    }
    memoize( 'get_blob_size' );
    
    # Recursively compute the size of a tree.  Note that git cat-file -s
    # does not give the cumulative size of all the blobs in a tree.
    sub compute_tree_size($);
    sub compute_tree_size($) {
        my $sha = shift;
        my $size;
        open my $objects, '-|', "git cat-file -p $sha";
        while( <$objects> ) {
            my ( $mode, $type, $hash, $name ) = split;
            if( $type eq 'blob' ) {
                $size += get_blob_size( $hash );
            } elsif( $type eq 'tree' ) {
                $size += compute_tree_size( $hash );
            }
        }
        return $size;
    }
    memoize( 'compute_tree_size' );
    
    # Generate a list of all commits
    open my $objects, '-|', "git rev-list $rev_list |
        git cat-file --batch-check";
    
    # Traverse the commit list and report on the size of each.
    while( <$objects> ) {
        my( $commit, $type, $size ) = split;
        my( $tree, $date ) = split( '@',
            qx( git show --format="%T@%ci" $commit | sed 1q ));
        chop $date;
        printf "$date: %d\n", compute_tree_size $tree;
    }
    
    0 讨论(0)
提交回复
热议问题