amazon-emr

Amazon Emr - What is the need of Task nodes when we have Core nodes?

白昼怎懂夜的黑 提交于 2020-12-05 19:56:31
问题 Hi guys I've been learning about Amazon EMR lately, and according to my knowledge the EMR cluster lets us choose 3 nodes. Master which runs the Primary Hadoop daemons like NameNode,Job Tracker and Resource manager. Core which runs Datanode and Tasktracker daemons. Task which only runs TaskTracker only. My question to you guys in why does EMR provide task nodes? Where as hadoop suggests that we should have Datanode daemon and Tasktracker daemon on the same node. What is Amazon's logic behind

Spark + EMR using Amazon's “maximizeResourceAllocation” setting does not use all cores/vcores

强颜欢笑 提交于 2020-08-20 18:01:06
问题 I'm running an EMR cluster (version emr-4.2.0) for Spark using the Amazon specific maximizeResourceAllocation flag as documented here. According to those docs, "this option calculates the maximum compute and memory resources available for an executor on a node in the core node group and sets the corresponding spark-defaults settings with this information". I'm running the cluster using m3.2xlarge instances for the worker nodes. I'm using a single m3.xlarge for the YARN master - the smallest