I use some variables, but when it is used, I never need it again, so I need to remove it and release the memory, but the function rm() seems not help:
memory.siz
I use gc()
to free up RAM between operations. Below is example of how I use it in a loop, but see here for a more detailed discussion of gc()
and here for more on memory management during an R session.
# load library
library(topicmodels)
# get data
data("AssociatedPress"))
# set number of topics to start with
k <- 20
# set model options
control_LDA_VEM <-
list(estimate.alpha = TRUE, alpha = 50/k, estimate.beta = TRUE,
verbose = 0, prefix = tempfile(), save = 0, keep = 0,
seed = as.integer(100), nstart = 1, best = TRUE,
var = list(iter.max = 10, tol = 10^-6),
em = list(iter.max = 10, tol = 10^-4),
initialize = "random")
# create the sequence that stores the number of topics to
# iterate over
sequ <- seq(20, 300, by = 20)
# basic loop to iterate over different topic numbers with gc
# after each run to empty out RAM
lda <- vector(mode='list', length = length(sequ))
for(k in sequ) {
lda[[k]] <- LDA(AssociatedPress[1:20,], k, method= "VEM", control = control_LDA_VEM)
gc() # here's where I put the garbage collection to free up memory before the next round of the loop
}
# convert list output to dataframe (suggestions for a simpler method are welcome!)
best.model.logLik <- data.frame(logLik = as.matrix(lapply(lda[sequ], logLik)), ntopic = sequ)
# plot
with(best.model.logLik, plot(ntopic, logLik, type = 'l', xlab="Number of topics", ylab="Log likelihood"))
# print ordered dataframe to see which number of topics has the highest log likelihood
(best.model.logLik.sort <- best.model.logLik[order(-as.numeric(best.model.logLik$logLik)), ])
logLik ntopic
2 -17904.12 40
3 -18105.48 60
1 -18181.84 20
4 -18569.7 80
5 -19736.94 100
6 -21919.6 120
7 -23785.08 140
8 -24914.23 160
9 -25493.76 180
10 -25837.64 200
11 -25964.23 220
12 -26061.01 240
13 -26117.92 260
14 -26149.44 280
15 -26168.91 300