问题
I'm experiencing Error Expected HTTP 101 response but was '403 Forbidden'
After I setup a new Kubernetes cluster using Kubeadm
with a single master and two workers, as I submit a pyspark sample app I encountered below ERROR
message:
spark-submit command
spark-submit --master k8s://master-host:port \
--deploy-mode cluster --name test-pyspark \
--conf spark.kubernetes.container.image=mm45/pyspark-k8s-example:2.4.1 \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.executor.instances=1 \
--conf spark.executor.memory=1000m \
--conf spark.driver.memory=1000m \
--conf spark.executor.cores=1 \
--conf spark.driver.cores=1 \
--conf spark.driver.maxResultSize=10g /usr/bin/run.py
Error Details:
19/08/24 19:38:06 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
Cluster Details:
- Kubernetes version 1.13.0
- Spark version 2.4.1
- Cloud platform: AWS running on EC2
Cluster Role Binding:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fabric8-rbac
subjects:
- kind: ServiceAccount
name: default
namespace: default
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
Full pod logs, and error stacktrace
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=driver-py
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.6.8'
+ export PYTHON_VERSION=3.6.8
+ PYTHON_VERSION=3.6.8
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.32.0.3 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner file:/usr/bin/run.py
19/08/24 19:38:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/08/24 19:38:04 INFO SparkContext: Running Spark version 2.4.1
19/08/24 19:38:04 INFO SparkContext: Submitted application: calculate_pyspark_example
19/08/24 19:38:04 INFO SecurityManager: Changing view acls to: root
19/08/24 19:38:04 INFO SecurityManager: Changing modify acls to: root
19/08/24 19:38:04 INFO SecurityManager: Changing view acls groups to:
19/08/24 19:38:04 INFO SecurityManager: Changing modify acls groups to:
19/08/24 19:38:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/08/24 19:38:04 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
19/08/24 19:38:04 INFO SparkEnv: Registering MapOutputTracker
19/08/24 19:38:04 INFO SparkEnv: Registering BlockManagerMaster
19/08/24 19:38:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/08/24 19:38:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/08/24 19:38:04 INFO DiskBlockManager: Created local directory at /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/blockmgr-718b703d-3587-44a6-8014-02162ae3a48c
19/08/24 19:38:04 INFO MemoryStore: MemoryStore started with capacity 400.0 MB
19/08/24 19:38:04 INFO SparkEnv: Registering OutputCommitCoordinator
19/08/24 19:38:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/08/24 19:38:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://test-pyspark-1566675457745-driver-svc.default.svc:4040
19/08/24 19:38:04 INFO SparkContext: Added file file:///usr/bin/run.py at spark://test-pyspark-1566675457745-driver-svc.default.svc:7078/files/run.py with timestamp 1566675484977
19/08/24 19:38:04 INFO Utils: Copying /usr/bin/run.py to /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/spark-0ee3145c-e088-494f-8da1-5b8f075d3bc8/userFiles-5cfd25bf-4775-404d-86c8-5a392deb1e18/run.py
19/08/24 19:38:06 INFO ExecutorPodsAllocator: Going to request 2 executors from Kubernetes.
19/08/24 19:38:06 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/08/24 19:38:06 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/08/24 19:38:06 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/08/24 19:38:06 INFO SparkUI: Stopped Spark web UI at http://test-pyspark-1566675457745-driver-svc.default.svc:4040
19/08/24 19:38:06 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
19/08/24 19:38:06 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
19/08/24 19:38:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/08/24 19:38:06 INFO MemoryStore: MemoryStore cleared
19/08/24 19:38:06 INFO BlockManager: BlockManager stopped
19/08/24 19:38:06 INFO BlockManagerMaster: BlockManagerMaster stopped
19/08/24 19:38:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/08/24 19:38:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/08/24 19:38:06 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/usr/bin/run.py", line 8, in <module>
with SparkContext(conf=conf) as sc:
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/08/24 19:38:06 INFO ShutdownHookManager: Shutdown hook called
19/08/24 19:38:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-178fc09d-353c-4906-8899-bdac338c804a
19/08/24 19:38:06 INFO ShutdownHookManager: Deleting directory /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/spark-0ee3145c-e088-494f-8da1-5b8f075d3bc8
Can you please help me making sense of this?
回答1:
This is happened in the Kubernetes v1.15.3, Kubernetes v1.14.6, Kubernetes v1.13.10 version, the project spark operator have a workaround adding adding the kubernetes-client last version (kubernetes-client-4.4.2.jar) and you need to delete the actually version in your image, you can add the next lines in your Dockerfile
RUN rm $SPARK_HOME/jars/kubernetes-client-3.0.0.jar
ADD https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.4.2/kubernetes-client-4.4.2.jar $SPARK_HOME/jars
If you ever get an Invocation error
after applying this fix, you might want to upgrade the kubernetes-model-*.jar to 4.4.2 as well.
But if you can't/don't want to upgrade your k8s-client from 3.0.0 to 4.4.2 since it's quite a huge hop and could result to legacy issues, here's a more in-depth (and more technical) solution and explanation as to what happened (ref: #SPARK-28921)
When the Kubernetes URL used doesn't specify a port (e.g., https://example.com/api/v1/...), the origin header for watch requests ends up with a port of -1 (e.g. https://example.com:-1). This happens because calling
getPort()
on a java.net.URL object that does not have a port explicitly specified will always return -1. The return value was always just inserted into the origin header.
https://github.com/fabric8io/kubernetes-client/pull/1669
As you can see here, the fix wasn't applied until kubernetes-client-4.4.x
. What I did was patch the current .jar and build a customized .jar:
- Clone / download Kubernetes-client source code from repo
- Apply the changes in this commit.
- Build the .jar file using maven plugin.
- Replace the
/opt/spark/jars/kubernetes-client-3.0.0.jar
with the customized .jar.
回答2:
if the error still exits downgrade Kubernetes version(v.1.14.3) and use spark 2.4.3 as there has been recent update in Kubernetes with the fix in golang.
reference https://andygrove.io/2019/08/apache-spark-regressions-eks/
来源:https://stackoverflow.com/questions/57643079/kubernetes-watchconnectionmanager-exec-failure-http-403