Getting more info from Rprof()

后端 未结 3 454
情书的邮戳
情书的邮戳 2020-12-31 22:01

I\'ve been trying to dig into what the time-hogs are in some R code I\'ve written, so I\'m using Rprof. The output isn\'t yet very helpful though:



        
相关标签:
3条回答
  • 2020-12-31 22:54

    Rprof takes samples of the call stack at intervals of time - that's the good news.

    What I would do is get access to the raw stack samples (stackshots) that it collects, and pick several at random and examine them. What I'm looking for is call sites (not just functions, but the places where one function calls another) that appear on multiple samples. For example, if a call site appears on 50% of samples, then that's what it costs, because its possible removal would save roughly 50% of total time. (Seems obvious, right? But it's not well known.)

    Not every costly call site is optimizable, but some are, unless the program is already as fast as possible.

    (Don't be distracted by issues like how many samples you need to look at. If something is going to save you a reasonable fraction of time, then it appears on a similar fraction of samples. The exact number doesn't matter. What matters is that you find it. Also don't be distracted by graph and recursion and time measurement and counting issues. What matters is, for each call site you see, the fraction of stack samples that show it.)

    0 讨论(0)
  • 2020-12-31 22:54

    Parsing the output that Rprof generates isn't too hard, and then you get access to absolutely everything.

    0 讨论(0)
  • 2020-12-31 23:08

    The existing CRAN package profr and proftools are useful for this. The latter can use Rgraphviz which isn't always installable.

    The R Wiki page on profiling has additional info and a nice script by Romain which can also visualize (but requires graphviz).

    0 讨论(0)
提交回复
热议问题