Improve h2o DRF runtime on a multi-node cluster

后端 未结 2 1243
情歌与酒
情歌与酒 2021-01-16 03:37

I am currently running h2o\'s DRF algorithm an a 3-node EC2 cluster (the h2o server spans across all 3 nodes). My data set has 1m rows and 41 columns (40 predic

2条回答
  •  无人及你
    2021-01-16 03:55

    If your Random Forest is slower on a multi-node H2O cluster, it just means that your dataset is not big enough to take advantage of distributed computing. There is an overhead to communicate between cluster nodes, so if you can train your model successfully on a single node, then using a single node will always be faster.

    Multi-node is designed for when your data is too big to train on a single node. Only then, will it be worth using multiple nodes. Otherwise, you are just adding communication overhead for no reason and will see the type of slowdown that you observed.

    If your data fits into memory on a single machine (and you can successfully train a model w/o running out of memory), the way to speed up your training is to switch to a machine with more cores. You can also play around with certain parameter values which affect training speed to see if you can get a speed-up, but that usually comes at a cost in model performance.

提交回复
热议问题