Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted)

后端 未结 2 578
借酒劲吻你
借酒劲吻你 2020-12-06 03:14

API call made to submit the Job. Response states - It is Running

On Cluster UI -

Worker (slave) - worker-20160712083825-172.31.17.189-59433 i

相关标签:
2条回答
  • 2020-12-06 03:23

    You can take a look at my answer in a similar question Apache Spark on Mesos: Initial job has not accepted any resources:

    While most of other answers focuses on resource allocation (cores, memory) on spark slaves, I would like to highlight that firewall could cause exactly the same issue, especially when you are running spark on cloud platforms.

    If you can find spark slaves in the web UI, you have probably opened the standard ports 8080, 8081, 7077, 4040. Nonetheless, when you actually run a job, it uses SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port which by default are randomly assigned. If your firewall is blocking these ports, the master could not retrieve any job-specific response from slaves and return the error.

    You can run a quick test by opening all the ports and see whether the slave accepts jobs.

    0 讨论(0)
  • 2020-12-06 03:35

    I also have the same issue. Below are my remarks when it occurs.

    1:17:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

    I noticed that it only occurs during the first query from scala shell where I run something fetching data from hdfs.

    When the problem occurs, the webui states that there's not any running applications.

    URL: spark://spark1:7077
    REST URL: spark://spark1:6066 (cluster mode)
    Alive Workers: 4
    Cores in use: 26 Total, 26 Used
    Memory in use: 52.7 GB Total, 4.0 GB Used
    Applications: 0 Running, 0 Completed
    Drivers: 0 Running, 0 Completed 
    Status: ALIVE
    

    It seems that something fails to start , I can't tell exactly which it is.

    However restarting the cluster a second time sets the Applications value to 1 and everything works well.

    URL: spark://spark1:7077
    REST URL: spark://spark1:6066 (cluster mode)
    Alive Workers: 4
    Cores in use: 26 Total, 26 Used
    Memory in use: 52.7 GB Total, 4.0 GB Used
    Applications: 1 Running, 0 Completed
    Drivers: 0 Running, 0 Completed
    Status: ALIVE
    

    I'm still investigate, this quick workaround can save times till final solution.

    0 讨论(0)
提交回复
热议问题