I thought it would be neat if it were possible to take a Git repository, run some script, and have it generate the number of lines in the code base, and the proportion of each a
You probably need gitdm, it can do exactly what you need. We use it for Mahara project to produce contribution statistics.
Just do what README suggests:
A typical command line used to generate the "who write 2.6.x" LWN articles looks like:
git log -p -M v2.6.19..v2.6.20 | gitdm -u -s -a -o results -h results.html
You can also customise it for your own purposes.
You can use git log, as illustrated in "Which Git commit stats are easy to pull".
Or you can have a look at Git Lookatgit project, which does inspect the number of lines changed, as seen in its gitauthor.rb class.
You could try to parse the output of git-blame. This command gives the last person that edited each line of a file.
This example is not exactly what you want but I think it gives you the idea:
git blame -e the/file | awk -F '<|>' '{print $2}' | sort | uniq -c
This will print the e-mail addresses of the authors together with the number of lines they modified lastly for a file, for example:
47 foo@bar.com
34712 blah@baz.com
To make it run on the whole repository, you can do something like this:
git ls-files | while read f; do git blame -e $f; done | awk -F '<|>' '{print $2}' | sort | uniq -c
The idea here is to first generate the list of files with git ls-files, and then run the above snippet on each of the files (using the snippet mentioned here). If you're running this on a large codebase, you may want to store intermediate results in temporary files rather than use pipes.