How can I have R utilize more of the processing power on my PC?

落花浮王杯 提交于 2020-01-17 07:20:34

问题


R version: 3.2.4

RStudio version: 0.99.893

Windows 7

Intel i7

480 GB RAM

str(df) 161976 obs. of 11 variables

I am a relative novice to R and do not have a software programming background. My task is to perform clustering on a data set.

The variables have been scaled and centered. I am using the following code to find the optimal number of clusters:

d <- dist(df, method = "euclidean")

library(cluster)

pamk.best <- pamk(d)

plot(pam(d, pamk.best$nc))

I have noticed that the system never uses more than 22% of the CPU's processing power.

I have taken the following actions so far:

  1. Unsuccessfully tried to change the Set Priority and Set Affinity setting for rsession.exe in the Processes tab of the Windows Task Manager. But, for some reason, it always comes back to low even when I set it to High or Realtime or anything else on that list. The Set Affinity setting shows that the system is allowing R to use all of the cores.
  2. I have adjusted the High Performance settings on my machine by going into Control Panel -> Power Options -> Change advance power settings -> Processor Power Management to 100%.
  3. I have read up the parallel processing CRAN Task View for High Performance Computing. I may be wrong but I don't think that calculating distance between observations in a data set is a task that should be parallelized, in the sense of, dividing up the data set in subsets and performing the distance calculations on subsets in parallel on different cores. Please correct me if I am wrong.

One option I have is to perform clustering on a subset of the data set and then predict cluster membership for the rest of the data set. But, I am thinking that if I have the processing power and the memory available, why can't I perform the clustering on the whole data set!

Is there a way to have the machine or R use higher percentage of the processing power and complete the task quicker?

EDIT: I think that my issue is different from the one described in Multithreading in R because I am not trying to run different functions in R. Rather, I am running only one function on one dataset and would like the machine to use more processing power that is available to it.


回答1:


It is probably using one core only.

There is no automatic way to parallelize computations. So what you need to do is rewrite parts of R (here, probably the dist and pam functions, which supposedly are C or Fortran code) to use more than one core.

Or you use a different tool, where someone did the work already. I'm a big fan of ELKI but it's mostly single-core. I think Julia may be worth a look because it is more similar to R (it is very similar to Matlab) and it was designed to use multi-core better. Of course there may also be an R module that parallelizes this. I'd look at the Rcpp modules, which are udually very fast.

But the key to fast and scalable clustering is to avoid distance matrixes. See: a 4-core system yields maybe a 3.5x speedup (often much less, because of turboboost) and a 8 core yields up to 6.5x better performance. But if you increase the data set size 10x you need 100x as much memory and computation. This is a race that you cannot win, except with clever algorithms




回答2:


Here is a quick example of using multiple CPU cores. The task has to be split similar to a for loop, but you cannot access any intermediate results for further calculations until the loop was fully executed.

library(doParallel)
registerDoParallel(cores = detectCores(all.tests = FALSE, logical = TRUE))

This would be a basic example of how you can split a task:

vec = c(1,3,5)
do = function(n) n^2
foreach(i = seq_along(vec)) %dopar% do(vec[i])

If packages are required within your do() function, you can load them in the following way:

foreach(i = seq_along(vec), .packages=c(some packages)) %dopar% do(vec[i])


来源:https://stackoverflow.com/questions/37034687/how-can-i-have-r-utilize-more-of-the-processing-power-on-my-pc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!