Application report for application_ (state: ACCEPTED) never ends for Spark Submit (with Spark 1.2.0 on YARN)

后端未结

关注

 13  1412

说谎

I am running kinesis plus spark application https://spark.apache.org/docs/1.2.0/streaming-kinesis-integration.html

I am running as below

command on ec2 inst

相关标签:

13条回答

迷失自我

2020-12-04 23:59
I had a small cluster where the resources were limited (~3GB per node). Solved this problem by changing the minimum memory allocation to a sufficiently low number.

From:
```
yarn.scheduler.minimum-allocation-mb: 1g
yarn.scheduler.increment-allocation-mb: 512m
```
To:
```
yarn.scheduler.minimum-allocation-mb: 256m
yarn.scheduler.increment-allocation-mb: 256m
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-12-05 00:00

I had the same problem on a local hadoop cluster with spark 1.4 and yarn, trying to run spark-shell. It had more then enough resources.

What helped was running the same thing from an interactive lsf job on the cluster. So perhaps there were some network limitations to run yarn from the head node...

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-05 00:01

When running with yarn-cluster all the application logging and stdout will be located in the assigned yarn application master and will not appear to spark-submit. Also being streaming the application usually does not exit. Check the Hadoop resource manager web interface and look at the Spark web ui and logs that will be available from the Hadoop ui.

0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-12-05 00:08
In one instance, I had this issue because I was asking for too many resources. This was on a small standalone cluster. The original command was
```
spark-submit --driver-memory 4G --executor-memory 7G -class "my.class" --master yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead my.jar
```
I succeeded in getting past 'Accepted' and into 'Running' by changing to
```
spark-submit --driver-memory 1G --executor-memory 3G -class "my.class" --master yarn --deploy-mode cluster --conf spark.yarn.executor.memoryOverhead my.jar
```
In other instances, I had this problem because of the way the code was written. We instantiated the spark context inside the class where it was used, and it did not get closed. We fixed the problem by instantiating the context first, passing it to the class where data is parallelized etc, then closing the context (sc.close()) in the caller class.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情话喂你

2020-12-05 00:09

I hit the same problem MS Azure cluster in their HDinsight spark cluster.
finally found out the issue was the cluster couldn't be able to talk back to the driver. I assume you used client mode when submit the job since you can provide this debug log.

reason why is that spark executors have to talk to driver program, and the TCP connection has to be bi-directional. so if your driver program is running in a VM(ec2 instance) which is not reachable via hostname or IP(you have to specify in spark conf, default to hostname), your status will be accepted forever.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2020-12-05 00:16
I got this error in this situation:
1. MASTER=yarn (or yarn-client)
2. spark-submit runs on a computer outside of the cluster and there is no route from the cluster to it because it's hidden by a router
Logs for container_1453825604297_0001_02_000001 (from ResourceManager web UI):
```
16/01/26 08:30:38 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
16/01/26 08:31:41 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.180:33074, retrying ...
16/01/26 08:32:44 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.180:33074, retrying ...
16/01/26 08:32:45 ERROR yarn.ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:484) 
```
I workaround it by using yarn cluster mode: MASTER=yarn-cluster.

On another computer which is configured in the similar way, but is's IP is reachable from the cluster, both yarn-client and yarn-cluster work.

Others may encounter this error for different reasons, and my point is that checking error logs (not seen from terminal, but ResourceManager web UI in this case) almost always helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页