问题
What is the correct way to profile memory in R code that contains calls to data.table
functions? Let's say I want to determine the maximum memory usage during an expression.
This reference indicates that Rprofmem
may not be the right choice:
https://cran.r-project.org/web/packages/profmem/vignettes/profmem.html
All memory allocations that are done via the native allocVector3() part of R's native API are logged, which means that nearly all memory allocations are logged. Any objects allocated this way are automatically deallocated by R's garbage collector at some point. Garbage collection events are not logged by profmem(). Allocations not logged are those done by non-R native libraries or R packages that use native code Calloc() / Free() for internal objects. Such objects are not handled by the R garbage collector.
The data.table source code contains plenty of calls to Calloc()
and malloc()
so this suggests that Rprofmem
will not measure all memory allocated by data.table
functions. If Rprofmem
is not the right tool, how come Matthew Dowle uses it here: R: loop over columns in data.table?
I've found a reference suggesting similar potential issues for gc()
(which can be used to measure maximum memory usage between two calls to gc()
):
https://r.789695.n4.nabble.com/Determining-the-maximum-memory-usage-of-a-function-td4669977.html
gc() is a good start. Call gc(reset = TRUE) before and gc() after your task, and you will see the maximum extra memory used by R in the interim. (This does not include memory malloced by compiled code, which is much harder to measure as it gets re-used.)
Nothing I've found suggests that similar issues exist with Rprof(memory.profiling=TRUE)
. Does this mean that the Rprof
approach will work for data.table
even though it doesn't always use the R API to allocate memory?
If Rprof(memory.profiling=TRUE)
in fact is not the right tool for the job, what is?
Would ssh.utils::mem.usage
work?
回答1:
This is not related to data.table. Recently there was a discussion on twitter about same dplyr behaviour: https://mobile.twitter.com/healthandstats/status/1182840075001819136
/usr/bin/time -v Rscript -e 'library(data.table); CJ(1:1e4, 1:1e4)' |& grep resident
There is also interesting cgmemtime project, but it requires a little bit more setup.
If you are on Windows I suggest you to move to Linux.
回答2:
If you are using Windows, you can call Powershell memory and other performance objects for RGui and Memory Compression as system commands through R and call various memory counters. I don't have a path to store Powershell objects in R yet. Powershell Code for RGui and 'Memory Compression' which Windows uses to store frequently used objects:
$t1 = ps | where {$_.Name -EQ 'RGui' -or $_.Name -EQ 'Memory Compression'};
$t2 = $t1 | Select { $_.Id;
[math]::Round($_.WorkingSet64/1MB);
[math]::Round($_.PrivateMemorySize64/1MB);
[math]::Round($_.VirtualMemorySize64/1MB) };
$t2 | ft *
$t1 | gm -View All
$t1.Modules
$t1.MaxWorkingSet
Powershell embedded in R:
ps_f <- function() { system("powershell -ExecutionPolicy Bypass -command $t1 = ps | where {$_.Name -EQ 'RGui' -or $_.Name -EQ 'Memory Compression'};
$t2 = $t1 | Select {
$_.Id;
[math]::Round($_.WorkingSet64/1MB);
[math]::Round($_.PrivateMemorySize64/1MB);
[math]::Round($_.VirtualMemorySize64/1MB) };
$t2 | ft * "); }
ps_f()
$_.Id;
[math]::Round($_.WorkingSet64/1MB);
[math]::Round($_.PrivateMemorySize64/1MB);
[math]::Round($_.VirtualMemorySize64/1MB)
-----------------------------------------------------------------------------------------------------------------------
{2264, 1076, 3, 1401}
{15832, 3544, 6691, 11965}
ps_mem <- function() { system("powershell -ExecutionPolicy Bypass -command $t1 = ps | where {$_.Name -EQ 'RGui' -or $_.Name -EQ 'Memory Compression'};
$t1 | Select ProcessName,MaxWorkingSet,MinWorkingSet,PagedMemorySize64,NonpagedSystemMemorySize64;")}
> ps_mem()
ProcessName : Memory Compression
MaxWorkingSet :
MinWorkingSet :
PagedMemorySize64 : 3411968
NonpagedSystemMemorySize64 : 0
ProcessName : Rgui
MaxWorkingSet : 1413120
MinWorkingSet : 204800
PagedMemorySize64 : 7014719488
NonpagedSystemMemorySize64 : 6662736
# run some data.table operation
> ps_mem()
ProcessName : Memory Compression
MaxWorkingSet :
MinWorkingSet :
PagedMemorySize64 : 3411968
NonpagedSystemMemorySize64 : 0
ProcessName : Rgui
MaxWorkingSet : 1413120
MinWorkingSet : 204800
PagedMemorySize64 : 7015915520
NonpagedSystemMemorySize64 : 6662736
Powershell Code:
$t1 | where {$_.ProcessName -eq "Rgui"} | Measure-Object -Maximum *memory* | ft Property,Maximum
Powershell embedded in R:
ps_mem_ <- function() { system("powershell -ExecutionPolicy Bypass -command $t1 = ps | where {$_.Name -EQ 'RGui' -or $_.Name -EQ 'Memory Compression'};
$t2 = $t1 | where {$_.ProcessName -eq 'Rgui'};
$t2 | Measure-Object -Maximum *memory* | ft Property,Maximum ")}
# having some problems with rollover...
> ps_mem_()
Property Maximum
-------- -------
NonpagedSystemMemorySize 6662736
NonpagedSystemMemorySize64 6662736
PagedMemorySize -1570734080
PagedMemorySize64 7019200512
PagedSystemMemorySize 680240
PagedSystemMemorySize64 680240
PeakPagedMemorySize -1260961792
PeakPagedMemorySize64 11623940096
PeakVirtualMemorySize -161009664
PeakVirtualMemorySize64 17018859520
PrivateMemorySize -1570734080
PrivateMemorySize64 7019200512
VirtualMemorySize -339103744
VirtualMemorySize64 12545798144
some data.table run
> ps_mem_()
Property Maximum
-------- -------
NonpagedSystemMemorySize 6662736
NonpagedSystemMemorySize64 6662736
PagedMemorySize -1570734080
PagedMemorySize64 7019200512
PagedSystemMemorySize 680240
PagedSystemMemorySize64 680240
PeakPagedMemorySize -1260961792
PeakPagedMemorySize64 11623940096
PeakVirtualMemorySize -161009664
PeakVirtualMemorySize64 17018859520
PrivateMemorySize -1570734080
PrivateMemorySize64 7019200512
VirtualMemorySize -339103744
VirtualMemorySize64 12545798144
To see all the Rgui objects:
$t1 | gm -View All
TypeName: System.Diagnostics.Process
Name MemberType Definition
---- ---------- ----------
Handles AliasProperty Handles = Handlecount
Name AliasProperty Name = ProcessName
NPM AliasProperty NPM = NonpagedSystemMemorySize64
PM AliasProperty PM = PagedMemorySize64
SI AliasProperty SI = SessionId
VM AliasProperty VM = VirtualMemorySize64
WS AliasProperty WS = WorkingSet64
Disposed Event System.EventHandler Disposed(System.Object, System.EventArgs)
ErrorDataReceived Event System.Diagnostics.DataReceivedEventHandler ErrorDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)
...
来源:https://stackoverflow.com/questions/58278838/memory-profiling-with-data-table