Is Hadoop right for running my simulations?

前端 未结 5 1820
花落未央
花落未央 2021-02-02 14:20

have written a stochastic simulation in Java, which loads data from a few CSV files on disk (totaling about 100MB) and writes results to another output file (not much data, just

相关标签:
5条回答
  • 2021-02-02 14:52

    Hadoop can be made to perform your simulation if you already have a Hadoop cluster, but it's not the best tool for the kind of application you are describing. Hadoop is built to make working on big data possible, and you don't have big data -- you have big computation.

    I like Gearman (http://gearman.org/) for this sort of thing.

    0 讨论(0)
  • 2021-02-02 14:53

    Simply said, though Hadoop may solve your problem here, its not the right tool for your purpose.

    0 讨论(0)
  • 2021-02-02 14:56

    I see a number of answers here that basically are saying, "no, you shouldn't use Hadoop for simulations because it wasn't built for simulations." I believe this is a rather short sighted view and would be akin to someone saying in 1985, "you can't use a PC for word processing, PCs are for spreadsheets!"

    Hadoop is a fantastic framework for construction of a simulation engine. I've been using it for this purpose for months and have had great success with small data / large computation problems. Here's the top 5 reasons I migrated to Hadoop for simulation (using R as my language for simulations, btw):

    1. Access: I can lease Hadoop clusters through either Amazon Elastic Map Reduce and I don't have to invest any time and energy into the administration of a cluster. This meant I could actually start doing simulations on a distributed framework without having to get administrative approval in my org!
    2. Administration: Hadoop handles job control issues, like node failure, invisibly. I don't have to code for these conditions. If a node fails, Hadoop makes sure the sims scheduled for that node gets run on another node.
    3. Upgradeable: Being a rather generic map reduce engine with a great distributed file system if you later have problems that involve large data if you're used to using Hadoop you don't have to migrate to a new solution. So Hadoop gives you a simulation platform that will also scale to a large data platform for (nearly) free!
    4. Support: Being open source and used by so many companies, the number of resources, both on line and off, for Hadoop are numerous. Many of those resources are written with the assumption of "big data" but they are still useful for learning to think in a map reduce way.
    5. Portability: I have built analysis on top of proprietary engines using proprietary tools which took considerable learning to get working. When I later changed jobs and found myself at a firm without that same proprietary stack I had to learn a new set of tools and a new simulation stack. Never again. I traded in SAS for R and our old grid framework for Hadoop. Both are open source and I know that I can land at any job in the future and immediately have tools at my fingertips to start kicking ass.
    0 讨论(0)
  • 2021-02-02 14:59

    While you might be able to get by using MapReduce with Hadoop, it seems like what you're doing might be better suited for a grid/job scheduler such as Condor or Sun Grid Engine. Hadoop is more suited for doing something where you take a single (very large) input, split it into chunks for your worker machines to process, and then reduce it to produce an output.

    0 讨论(0)
  • Since you are already using Java, I suggest taking a look at GridGain which, I think, is particularly well suited to your problem.

    0 讨论(0)
提交回复
热议问题