问题
I am using ubuntu 16 and trying to set up spark cluster on my lan.
I have managed to configure a spark master, and manage to connect a slave from the same machine and see it on localhost:8080
When i try to connect from another machine, problems start, i configured passwordless ssh as explained here
when i try to connect to the master using start-slave.sh spark://master:port as explained here
I am getting this error log
I tried accesing the master using the local ip and the local name (i manage to ssh to the master using both and without password. both to the user and to root)
I tried port 6066 and port 7077 on both
I don't get error massage but the new slave is not apearing in the master's localhost:8080 page
And keep getting this error log
Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://latitude:6066 ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/07/26 22:09:09 INFO Worker: Started daemon with process name: 20609@name-beckup-laptop 17/07/26 22:09:09 INFO SignalUtils: Registered signal handler for TERM 17/07/26 22:09:09 INFO SignalUtils: Registered signal handler for HUP 17/07/26 22:09:09 INFO SignalUtils: Registered signal handler for INT 17/07/26 22:09:09 WARN Utils: Your hostname, name-beckup-laptop resolves to a loopback address: 127.0.1.1; using 192.168.14.84 instead (on interface wlp2s0) 17/07/26 22:09:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/07/26 22:09:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/07/26 22:09:09 INFO SecurityManager: Changing view acls to: name 17/07/26 22:09:09 INFO SecurityManager: Changing modify acls to: name 17/07/26 22:09:09 INFO SecurityManager: Changing view acls groups to: 17/07/26 22:09:09 INFO SecurityManager: Changing modify acls groups to: 17/07/26 22:09:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(name); groups with view permissions: Set(); users with modify permissions: Set(name); groups with modify permissions: Set() 17/07/26 22:09:09 INFO Utils: Successfully started service 'sparkWorker' on port 34777. 17/07/26 22:09:09 INFO Worker: Starting Spark worker 192.168.14.84:34777 with 4 cores, 14.6 GB RAM 17/07/26 22:09:09 INFO Worker: Running Spark version 2.2.0 17/07/26 22:09:09 INFO Worker: Spark home: /usr/local/spark 17/07/26 22:09:10 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 17/07/26 22:09:10 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://192.168.14.84:8081 17/07/26 22:09:10 INFO Worker: Connecting to master latitude:6066... 17/07/26 22:09:10 WARN Worker: Failed to connect to master latitude:6066 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100) at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108) at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:241) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Failed to connect to latitude/192.168.14.83:6066 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190) ... 4 more Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: latitude/192.168.14.83:6066 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ... 1 more
Thanks!
回答1:
Found the problem!
you need to add a file in /conf/spark-env
There add the following:
SPARK_MASTER_IP='<ip of master without port>'
and then
start-master.sh -h <ip of master>:7077
after that
start-slave.sh spark://<master ip>:7077
will work like a charm.
回答2:
I have the same problem, running spark/sbin/start-slave.sh
on master node.
hadoop@master:/opt/spark$ sudo ./sbin/start-slave.sh --master spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 --master spark://master:7077
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
full log in /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
I found my fault, I should not use --master
keyword and just run command
hadoop@master:/opt/spark$ sudo ./sbin/start-slave.sh spark://master:7077
following the steps of this tutorial: https://phoenixnap.com/kb/install-spark-on-ubuntu
In addition, my configuration of /opt/spark/conf/spark-env.sh
is as follows:
SPARK_MASTER_HOST="master"
JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
which is master is hostname of my server specified in /etc/hosts
来源:https://stackoverflow.com/questions/45335674/spark-start-slave-not-connecting-to-master