Git: Find duplicate blobs (files) in this tree

后端 未结 6 2019
长发绾君心
长发绾君心 2021-02-04 03:32

This is sort of a follow-up to this question.

If there are multiple blobs with the same contents, they are only stored once in the git repository because their SHA-1\'s

6条回答
  •  我在风中等你
    2021-02-04 04:25

    Running this on the codebase I work on was an eye-opener I can tell you!

    #!/usr/bin/perl
    
    # usage: git ls-tree -r HEAD | $PROGRAM_NAME
    
    use strict;
    use warnings;
    
    my $sha1_path = {};
    
    while (my $line = ) {
        chomp $line;
    
        if ($line =~ m{ \A \d+ \s+ \w+ \s+ (\w+) \s+ (\S+) \z }xms) {
            my $sha1 = $1;
            my $path = $2;
    
            push @{$sha1_path->{$sha1}}, $path;
        }
    }
    
    foreach my $sha1 (keys %$sha1_path) {
        if (scalar @{$sha1_path->{$sha1}} > 1) {
            foreach my $path (@{$sha1_path->{$sha1}}) {
                print "$sha1  $path\n";
            }
    
            print '-' x 40, "\n";
        }
    }
    

提交回复
热议问题