问题
How can we get the overall memory used for a spark job. I am not able to get the exact parameter which we can refer to retrieve the same. Have referred to Spark UI but not sure of the field which we can refer. Also in Ganglia we have the following options: a)Memory Buffer b)Cache Memory c)Free Memory d)Shared Memory e)Free Swap Space
Not able to get any option related to Memory Used. Does anyone have some idea regarding this.
回答1:
If you persist your RDDs you can see how big they are in memory via the UI.
It's hard to get an idea of how much memory is being used for intermediate tasks (e.g. for shuffles). Basically Spark will use as much memory as it needs given what's available. This means that if your RDDs take up more than 50% of your available resources, your application might slow down because there are fewer resources available for execution.
来源:https://stackoverflow.com/questions/39615702/monitoring-the-memory-usage-of-spark-jobs