问题
My code eats up to 3GB of memory at a single time. I figured it out using gc()
:
gc1 <- gc(reset = TRUE)
graf(...) # the code
gc2 <- gc()
cat(sprintf("mem: %.1fMb.\n", sum(gc2[,6] - gc1[,2])))
# mem: 3151.7Mb.
Which I guess means that there is one single time, when 3151.7 MB are allocated at once.
My goal is to minimize the maximum memory allocated at any single time. How do I figure out which part of my code is reposponsible for the maximum usage of those 3GB of memory? I.e. the place where those 3GB are allocated at once.
I tried memory profiling with
Rprof
andprofvis
, but both seem to show different information (which seems undocumented, see my other question). Maybe I need to use them with different parameters (or use different tool?).I've been looking at
Rprofmem
... but:- in the profmem vignette they wrote: "with
utils::Rprofmem()
it is not possible to quantify the total memory usage at a given time because it only logs allocations and does therefore not reflect deallocations done by the garbage collector." - how to output the result of
Rprofmem
? This source speaks for itself: "Summary functions for this output are still being designed".
- in the profmem vignette they wrote: "with
回答1:
My code eats up to 3GB of memory at a single time.
While it looks like your code is consuming a lot of RAM at once by calling one function you can break down the memory consumption by looking into the implementation details of the function (and its sub calls) by using RStudio's built-in profiling (based on profvis
) to see the execution time and rough memory consumption. Eg. if I use my demo code:
# graf code taken from the tutorial at
# https://rawgit.com/goldingn/intecol2013/master/tutorial/graf_workshop.html
library(dismo) # install.packages("dismo")
library(GRaF) # install_github('goldingn/GRaF')
data(Anguilla_train)
# loop to call the code under test several times to get better profiling results
for (i in 1:5) {
# keep the first n records of SegSumT, SegTSeas and Method as covariates
covs <- Anguilla_train[, c("SegSumT", "SegTSeas", "Method")]
# use the presence/absence status to fit a simple model
m1 <- graf(Anguilla_train$Angaus, covs)
}
Start profiling with the Profile > Start Profiling menu item, source the above code and stop the profiling via the above menu.
After Profile > Stop Profiling RStudio is showing the result as Flame Graph but what you are looking for is hidden in the Data tab of the profile result (I have unfolded all function calls which show heavy memory consumption):
The numbers in the memory
column indicate the memory allocated (positive) and deallocated (negative numbers) for each called function and the values should include the sum of the whole sub call tree + the memory directly used in the function.
My goal is to minimize the maximum memory allocated at any single time.
Why do you want to do that? Do you run out-of-memory or do you suspect that repeated memory allocation is causing long execution times?
High memory consumption (or repeated allocations/deallocations) often come together with a slow execution performance since copying memory costs time.
So look at the Memory
or Time
column depending on your optimization goals to find function calls with high values.
If you look into the source code of the GRaF
package you can find a loop in the graf.fit.laplace function (up to 50 "newton iterations") that calls "slow" R-internal functions like chol
, backsolve
, forwardsolve
but also slow functions implemented in the package itself (like cov.SE.d1
).
Now you can try to find faster (or less memory consuming) replacements for these functions... (sorry, I can't help here).
PS: profvis
uses Rprof
internally so the profiling data is collected by probing the current memory consumption in regular time intervals and counting it for the currently active function (call stack).
Rprof
has limitations (mainly not an exact profiling result since the garbage collector triggers at non-deterministic times and the freed memory is attributed to the function the next probing interval break stops at and it does not recognize memory allocated directly from the OS via C/C++ code/libraries that bypasses R's memory management API).
Still it is the easiest and normally good enough indication of memory and performance problems...
For an introduction into profvis
see: For https://rstudio.github.io/profvis/
来源:https://stackoverflow.com/questions/58250531/memory-profiling-in-r-how-to-find-the-place-of-maximum-memory-usage