apache-spark-standalone

Spark: slave unable to connect to master

寵の児 提交于 2020-02-06 19:33:29
问题 I am trying to setup a standalone spark cluster on 2 machines within my organization network. Both are ubuntu 16.04 machines with same configuration. There is passwordless ssh setup from master to slave and slave to master as well. The config of the master and the slave node are mentioned below. Master configuration: Spark-version: 2.4.4 /etc/hosts : 10.x.x.2 master 10.x.x.3 slave01 127.0.0.1 dm.abc.net localhost dm 127.0.1.1 10.x.x.4 dm.abc.net dm 10.x.x.5 abc.net /usr/local/spark/conf

Spark: slave unable to connect to master

本秂侑毒 提交于 2020-02-06 19:32:49
问题 I am trying to setup a standalone spark cluster on 2 machines within my organization network. Both are ubuntu 16.04 machines with same configuration. There is passwordless ssh setup from master to slave and slave to master as well. The config of the master and the slave node are mentioned below. Master configuration: Spark-version: 2.4.4 /etc/hosts : 10.x.x.2 master 10.x.x.3 slave01 127.0.0.1 dm.abc.net localhost dm 127.0.1.1 10.x.x.4 dm.abc.net dm 10.x.x.5 abc.net /usr/local/spark/conf

Running Spark driver program in Docker container - no connection back from executor to the driver?

邮差的信 提交于 2020-01-22 10:16:48
问题 UPDATE: The problem is resolved. The Docker image is here: docker-spark-submit I run spark-submit with a fat jar inside a Docker container. My standalone Spark cluster runs on 3 virtual machines - one master and two workers. From an executor log on a worker machine, I see that the executor has the following driver URL: "--driver-url" "spark://CoarseGrainedScheduler@172.17.0.2:5001" 172.17.0.2 is actually the address of the container with the driver program, not the host machine where the

Forcing driver to run on specific slave in spark standalone cluster running with “--deploy-mode cluster”

柔情痞子 提交于 2020-01-04 04:02:29
问题 I am running a small spark cluster, with two EC2 instances (m4.xlarge). So far I have been running the spark master on one node, and a single spark slave (4 cores, 16g memory) on the other, then deploying my spark (streaming) app in client deploy-mode on the master. Summary of settings is: --executor-memory 16g --executor-cores 4 --driver-memory 8g --driver-cores 2 --deploy-mode client This results in a single executor on my single slave running with 4 cores and 16Gb memory. The driver runs

Spark standalone connection driver to worker

十年热恋 提交于 2019-12-29 07:21:18
问题 I'm trying to host locally a spark standalone cluster. I have two heterogeneous machines connected on a LAN. Each piece of the architecture listed below is running on docker. I have the following configuration master on machine 1 (port 7077 exposed) worker on machine 1 driver on machine 2 I use a test application that opens a file and counts its lines. The application works when the file replicated on all workers and I use SparkContext.readText() But when when the file is only present on

Apache Spark: Differences between client and cluster deploy modes

て烟熏妆下的殇ゞ 提交于 2019-12-28 03:18:41
问题 TL;DR: In a Spark Standalone cluster, what are the differences between client and cluster deploy modes? How do I set which mode my application is going to run on? We have a Spark Standalone cluster with three machines, all of them with Spark 1.6.1: A master machine, which also is where our application is run using spark-submit 2 identical worker machines From the Spark Documentation, I read: (...) For standalone clusters, Spark currently supports two deploy modes. In client mode, the driver

Apache Spark method not found sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;

百般思念 提交于 2019-12-25 00:21:19
问题 I encounter this problem while running an automated data processing script in spark-shell. First couple of iterations work fine, but it always sooner or later bumps into this error. I googled this issue but haven't found an exact match. Other similar issues are outside of spark context. I guess it may have something to do with JVM version, but I cannot figure out how to solve the problem. I used 2 machines within a spark standalone cluster. Machine No.1 Java Information: java 10.0.2 2018-07

spark-submit on standalone cluster complain about scala-2.10 jars not exist

☆樱花仙子☆ 提交于 2019-12-24 06:36:04
问题 I'm new to Spark and downloaded a pre-compiled Spark binaries from Apache (Spark-2.1.0-bin-hadoop2.7) When submitting my scala (2.11.8) uber jar the cluster throw and error: java.lang.IllegalStateException: Library directory '/root/spark/assembly/target/scala-2.10/jars' does not exist; make sure Spark is built I'm not running Scala 2.10 and Spark isn't compiled (as much as I know) with Scala 2.10 Could it be that one of my dependencies is based on Scala 2.10 ? Any suggestions what can be

How multiple executors are managed on the worker nodes with a Spark standalone cluster?

柔情痞子 提交于 2019-12-22 00:13:03
问题 Until now, I have only used Spark on a Hadoop cluster with YARN as the resource manager. In that type of cluster, I know exactly how many executors to run and how the resource management works. However, know that I am trying to use a Standalone Spark Cluster, I have got a little bit confused. Correct me where I am wrong. From this article, by default, a worker node uses all the memory of the node minus 1 GB. But I understand that by using SPARK_WORKER_MEMORY , we can use lesser memory. For

Continuously INFO JobScheduler:59 - Added jobs for time *** ms in my Spark Standalone Cluster

a 夏天 提交于 2019-12-19 09:58:17
问题 We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration. Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears in console. 2016-03-29 11:35:25,044 INFO TaskSchedulerImpl:59 - Removed TaskSet 18.0, whose tasks have all completed, from pool 2016-03-29 11:35:25,044 INFO DAGScheduler:59 - Job 18 finished: foreachRDD at EventProcessor.java:87, took 1.128755 s