I would very much like a line profiler. This exists in Matlab and Python, and is very useful for finding bits of code that take a lot of time or are executed more (or less) than expected. A lot of my code involves function optimizations and how many times something iterates may not be known in advance (though most iterations are constrained or specified).
The call stack is useful if all of your code is in R and is very simple, but as I recently posted about it, it takes a painstaking effort if your code is complex.
It's quite easy to develop a line profiler for a given bit of code. A naive way is to index every line (or just pre-specified sections) and insert a call to log proc.time()
that line. In a loop, I simply enumerate sections of code and store in a 2 dimensional list the proc.time
values for section i
in iteration k
. [See update below: this isn't actually a way to do a line profiler for all kinds of code.]
One can use such a tool to find hotspots, anomalies (e.g. code that should be O(n) but is really O(n^2)), code that may benefit from memoization (a line profiler doesn't tell you this, but it lets you know where to look), code that is mistakenly inside a loop, and more.
Update 1: Inserting a timing line between every function line is slightly erroneous: the definition of a line of code is not simply code separated by whitespace. Being able to parse the code into an AST is necessary for knowing where operations begin and end. As discussed in some of the answers to this question, there are some tools (namely, showTree
and walkCode
in the codetools
package) for doing this. Simply applying a regular expression to source code would be a very bad thing to do.