Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole?

前端未结

关注

 2  1343

野的像风 2021-02-11 07:48

In order to reduce the time for provisioning, we\'ve decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we\'

2条回答

难免孤独 (楼主)

2021-02-11 08:07

The page you linked showed ways of either manually or programmatically increasing the nodes in your cluster. I couldn't find anything else about autoscaling for EMR.

Unless we're missing some facts, you’d still have to come up with your own scaling algorithm and process. If you’re taking factors into account such as your job backlog, the units of time you’re paying for, the use of less-expensive “spot” instances, multiple clusters, etc, this is probably not a trivial exercise.

In addition to increasing size of your cluster, there is also downsizing. EMR allows this (manually or programmatically) for task nodes, but they state they don't for core nodes. You'd have to terminate the core node through AWS functionality and risk losing data. If your workloads increase and decrease over time, core node downsizing would be valuable for keeping your costs lower.

Qubole automatically takes care of all of these things out of the box. You run your jobs from the UI or API and it starts, sizes or resizes the cluster. When you're finished, it downsizes or terminates the cluster. It also allows you to have a minimum number of nodes constantly running at one time. I've also heard that the startup time for Qubole nodes is significantly faster than EMR.

Hope this helps you.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...