Is there anyway to see how a file\'s size has changed through time in a git repository? I want to see how my main.js file (which is the combination of several files and mini
In case this is of use for someone, this script will show the size of a given file in different commits:
git log <file_name> | grep "^commit" | cut -f2 -d" " | while read hash; do
echo -n "$hash -- "
git show $hash:<file_path_off_of_git_root_without_leading_slash> | wc -c
done
While commands like git log <filename>
, git whatchanged
, etc. can show the history pertaining to that file, I don't see anywhere in either the built-in or custom pretty formats an option that shows size (sadly, the --log-size
option is only for the log messages!).
However, you can get a rough idea of the size by seeing the total number of lines added and removed in each commit. You can sort of visualize it with the command git log --stat <filename>
, which uses plus and minus signs. Or use git log --numstat <filename>
to collect the number of lines added or removed in each commit and use the numbers in some other visualization.
Create a file called .gitattributes
and add the following line:
main.js -diff
This turns off line-based diffs for main.js
. Now run the following command:
git log --stat main.js
The log will include lines like
main.js | Bin 4316 -> 4360 bytes
After you're done, you should probably delete .gitattributes
. I don't know what other changes in git's behavior may be caused by the -diff
attribute.
Tested with git versions 1.7.12.4 and 1.7.9.5.
Source: ewall's answer and https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_marking_files_as_binary
You could create a script that uses the output from git show --pretty=raw <commit>
to obtain the tree, then uses git ls-tree -r -l
to obtain the blob you are looking for, including the file size.
In case you have ruby and the grit gem installed, here's a little script I threw together:
require 'grit'
if ARGV.size < 1
puts 'usage: file-size FILE'
puts 'run from within the git repo root'
exit
end
filename = ARGV[0].to_s
repo = Grit::Repo.new('.')
commits = repo.log('master', filename)
commits.each do |commit|
blob = commit.tree/filename
puts "#{commit} #{blob.size} bytes"
end
Example usage (filename of script is file-size.rb), will show you the history for somedir/somefile:
myproject$ ruby file-size.rb somedir/somefile
You can use either git ls-tree -r -l <revision> <path>
to get the blob size at given revision, e.g.
$ git ls-tree -r -l v1.6.0 gitweb/README 100644 blob 825162a0b6dce8c354de67a30abfbad94d29fdde 16067 gitweb/README
The blob size in this example is '16067'. The disadvantage of this solution is that git ls-tree can process only one revision at once.
You can use instead git cat-file --batch-check < <list-of-objects>
instead, feeding it blob identifiers. If location of file didn't change through history (file was not moved), you can use git rev-list <starting-point> -- <path>
to get list of revisions touching given path, translate them into names of blobs using <revision>:<path>
extended SHA-1 syntax (see git-rev-parse manpage), and feed it to git cat-file. Example:
$ git rev-list -5 v1.6.0 -- gitweb/README | sed -e 's/$/:gitweb\/README/g' | git cat-file --batch-check 825162a0b6dce8c354de67a30abfbad94d29fdde blob 16067 6908036402ffe56c8b0cdcebdfb3dfacf84fb6f1 blob 16011 356ab7b327eb0df99c0773d68375e155dbcea0be blob 14248 8f7ea367bae72ea3ce25b10b968554f9b842fffe blob 13853 8dfe335f73c223fa0da8cd21db6227283adb95ba blob 13801
Here is a Bash function that will report the size over time in the following format.
LoC Date Commit ID Subject
942 2019-08-31 18:09:34 +0200 35fc67c122 Declare some XML namespaces in replacement of OGCPrefixMapper, which has been removed from Apache SIS. https://issues.apache.org/jira/browse/SIS-126
943 2019-08-09 16:52:29 +0200 e8438ab869 fix(GML): fix relative path resolving inside a jar
934 2019-08-05 15:37:46 +0200 1e0c0b03c4 fix(GML): fix all test cases
932 2019-07-30 15:54:53 +0200 fddea5db24 feat(GML): work on fallback for non-xsd Feature store
932 2019-07-23 16:40:23 +0200 8d9a6a7dd0 feat(GML): improve support for custom XML mappings
932 2019-06-26 15:18:43 +0200 43ea6e0bd7 feat(GML): add concurrency support for read/write operations
932 2019-06-21 09:27:41 +0200 07a9993b4b feat(GML): support group reference min/max occurs attributes
932 2019-06-21 09:27:41 +0200 352a9104ae feat(GML): fix resolving local files xsd paths
919 2018-06-08 15:41:26 +0200 01ac7538e7 Merge branch 'master' into sis-migration
919 2018-05-16 16:40:04 +0200 16fe7590c5 fix(JAXP): various fix for WFS 2.0.0
912 2018-04-11 10:09:22 +0200 bf3a38bdc4 chore(*): update JTS version 1.15.0
912 2017-11-09 20:15:23 +0100 bc14dc4be1 fix(Client): fix minor problems on WFS querying
901 2017-10-20 11:41:43 +0200 f686d7ff15 feat(Storage): add support for GML 2.1.2
882 2017-05-16 23:07:31 +0200 f20c34c1e2 refactor(Feature): renamed the Geotk flavor of org.apache.sis.feature package as org.geotoolkit.feature.
Here is the function:
git-log-size() {
git rev-list HEAD -- "$1" | while read cid; do
git cat-file blob "$cid:$1" | wc -l | tr -d '\n'
echo -n $'\t'
git log -1 "--pretty=%ci%x09%h%x09%s" $cid
done | column -t -s$'\t'
}
It is not particularly efficient, but does the job. It uses some utilities which are pretty common (wc, tr, column).
The size is reported as lines of code (LoC) since this is the common metric in software development, just change the "-l" option of wc if you prefer something else.
Here is how to call it:
git-log-size <path>