Create a cluster of co-workers' Windows 7 PCs for parallel processing in R?

后端 未结 3 734
失恋的感觉
失恋的感觉 2020-12-29 10:39

I am running the termstrc yield curve analysis package in R across 10 years of daily bond price data for 5 different countries. This is highly compute intensive, it takes 32

相关标签:
3条回答
  • 2020-12-29 11:09

    Yes you can. There are a number of ways. One of the easiest is to use redis as a backend (as easy as calling sudo apt-get install redis-server on an Ubuntu machine; rumor has that you could have a redis backend on a windows machine too).

    By using the doRedis package, you can very easily en-queue jobs on a task queue in redis, and then use one, two, ... idle workers to query the queue. Best of all, you can easily mix operating systems so yes, your co-workers' windows machines qualify. Moreover, you can use one, two, three, ... clients as you see fit and need and scale up or down. The queue does not know or care, it simply supplies jobs.

    Bost of all, the vignette in the doRedis has working examples of a mix of Linux and Windows clients to make a bootstrapping example go faster.

    0 讨论(0)
  • 2020-12-29 11:17

    What about OpenCL?

    This would require rewriting the C code, but would allow potentially large speedups. The GPU has immense computing power.

    0 讨论(0)
  • 2020-12-29 11:29

    Perhaps not the answer you were looking for, but - this is one of those situations where an alternative is sooo much better that it's hard to ignore.

    The cost of AWS clusters is ridiculously low (my emphasis) for exactly these types of computing problems. You pay only for what you use. I can guarantee you that you will save money (at the very least in opportunity costs) by not spending the time trying to convert 12 windows machines into a cluster. For your purposes, you could probably even do this for free. (IIRC, they still offer free computing time on clusters)

    References:

    • Using AWS for parallel processing with R
    • http://blog.revolutionanalytics.com/2011/01/run-r-in-parallel-on-a-hadoop-cluster-with-aws-in-15-minutes.html
    • http://code.google.com/p/segue/
    • http://www.vcasmo.com/video/drewconway/8468
    • http://aws.amazon.com/ec2/instance-types/
    • http://aws.amazon.com/ec2/pricing/

    Some of these instances are so powerful you probably wouldn't even need to figure out how to setup your work on a cluster (given your current description). As you can see from the references costs are ridiculously low, ranging from 1-4$ per hour of compute time.

    0 讨论(0)
提交回复
热议问题