Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole?

前端 未结 2 1343
野的像风
野的像风 2021-02-11 07:48

In order to reduce the time for provisioning, we\'ve decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we\'

2条回答
  •  难免孤独
    2021-02-11 08:07

    The page you linked showed ways of either manually or programmatically increasing the nodes in your cluster. I couldn't find anything else about autoscaling for EMR.

    Unless we're missing some facts, you’d still have to come up with your own scaling algorithm and process. If you’re taking factors into account such as your job backlog, the units of time you’re paying for, the use of less-expensive “spot” instances, multiple clusters, etc, this is probably not a trivial exercise.

    In addition to increasing size of your cluster, there is also downsizing. EMR allows this (manually or programmatically) for task nodes, but they state they don't for core nodes. You'd have to terminate the core node through AWS functionality and risk losing data. If your workloads increase and decrease over time, core node downsizing would be valuable for keeping your costs lower.

    Qubole automatically takes care of all of these things out of the box. You run your jobs from the UI or API and it starts, sizes or resizes the cluster. When you're finished, it downsizes or terminates the cluster. It also allows you to have a minimum number of nodes constantly running at one time. I've also heard that the startup time for Qubole nodes is significantly faster than EMR.

    Hope this helps you.

提交回复
热议问题