Jenkins Windows slave connection getting terminated with java.nio.channels.ClosedChannelException

假装没事ソ 提交于 2019-12-13 12:06:34

问题


While connecting to windows machine as slave, i am getting following error i think its some network related issue, but need some help where to start looking or what is a possible solution for this.

INFO: Terminated
Aug 01, 2017 10:15:54 PM hudson.remoting.JarCacheSupport$1 run
WARNING: Failed to resolve a jar 06bcb4519543f5ec83cf9d6da9f6cfbe
java.io.IOException: Failed to write to C:\Users\Administrator\.jenkins\cache\jars\06\BCB4519543F5EC83CF9D6DA9F6CFBE.jar
        at hudson.remoting.FileSystemJarCache.retrieve(FileSystemJarCache.java:133)
        at hudson.remoting.JarCacheSupport$1.run(JarCacheSupport.java:64)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:483)
        at java.util.concurrent.FutureTask.run(FutureTask.java:274)
        at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
        at java.lang.Thread.run(Thread.java:809)
Caused by: java.io.IOException: Backing channel 'JNLP4-connect connection to dr2r4m1p21/172.20.238.41:9001' is disconnected.
        at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
        at com.sun.proxy.$Proxy4.writeJarTo(Unknown Source)
        at hudson.remoting.FileSystemJarCache.retrieve(FileSystemJarCache.java:98)
        ... 5 more
Caused by: java.nio.channels.ClosedChannelException
        at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
        at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
        at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:166)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
        at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
        at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48)
        at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
        at hudson.remoting.Engine$1$1.run(Engine.java:94)
        ... 1 more

Above mentioned stack trace is from salve (Windows) machine and my Jenkins/Master is running on RHEL, i am able to see following stacktrace there.

INFO: Accepted JNLP4-connect connection #113 from /172.20.238.31:60363
Aug 01, 2017 12:45:55 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
WARNING: Computer.threadPoolForRemoting [#42] for Build_Agent terminated
java.nio.channels.ClosedChannelException
        at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
        at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
        at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
        at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
        at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800)
        at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
        at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311)
        at hudson.remoting.Channel.close(Channel.java:1295)
        at hudson.remoting.Channel.close(Channel.java:1263)
        at jenkins.slaves.DefaultJnlpSlaveReceiver.afterChannel(DefaultJnlpSlaveReceiver.java:173)
        at org.jenkinsci.remoting.engine.JnlpConnectionState$4.invoke(JnlpConnectionState.java:421)
        at org.jenkinsci.remoting.engine.JnlpConnectionState.fire(JnlpConnectionState.java:312)
        at org.jenkinsci.remoting.engine.JnlpConnectionState.fireAfterChannel(JnlpConnectionState.java:418)
        at org.jenkinsci.remoting.engine.JnlpProtocol4Handler$Handler$1.run(JnlpProtocol4Handler.java:334)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

回答1:


  • I observed the same error after our jenkins master was updated. It is likely due to incompatibility between Java 7 (v80) and latest Java 8.
  • Check the java version being used by your master, and the java version of your slave.
  • In my case, I am running swarm-client-2.0-jar-with-dependencies.jar on a linux host, and it was using Java 7.

    java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

  • Our jenkins master was upgraded and is now running Java 8

    java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

  • When the java on the slave was updated to Java 8, the connection issues disappeared.



回答2:


I was experiencing a similar error as the OP where the connection to my slave was dropping. The root cause of the issue was not due to a mismatch in Java versions between Jenkins slave and master hosts.

Solution If you are running Jenkins in an EC2 instance on AWS behind an Elastic Load Balancer (ELB), increase the "idle timeout" value under the "attributes" section from the default 60 seconds. I set the new value to 600 and no longer experienced the error.

It appears that if a single command in your build process takes greater than 60 seconds with no log output, the ELB will terminate the session due to idle activity.

Source: https://issues.jenkins-ci.org/browse/JENKINS-44001?focusedCommentId=312412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-312412




回答3:


in addition to the error log in the post, I got also the error log under the jenkins directory in the slave (for me it was C:\jenkins\jenkins-slave.err.log):

JNLP file http://jenkins.domain.com/computer/my_slave_name/slave-agent.jnlp?encrypt=true has invalid arguments: [#####################################, my_slave_name, -workDir, c:\jenkins, -internalDir, remoting, -url, http://jenkins.domain.com/, -headless, -jar-cache, C:\Users\Administrator.jenkins\cache\jars] Most likely a configuration error in the master "-workDir" is not a valid option

my solution:

1)windows slave level: close the services console in the GUI for all users - this is must. from some reason Microsoft is locking installation/removal of windows services

2)windows slave level: kill all java and jenkins-slave processes (if exist)

3)windows slave level: delete the jenkins slave service (if exist) from cmd: sc delete jenkinsslave-c__jenkins /force (in my case)

4)windows slave level: verify that you have java 8 installed: i'm using jdk1.8.0_151 . uninstall all old java version

5)jenkins master ui level: Change the way the Jenkins is connect to the slave under slave configure --> Launch method: Let Jenkins control this Windows slave as a Windows service (instead of Launch agent via Java Web Start)

6) aws level: Increase the aws elb Idle timeout to 600 (from 60) - like @njtman suggested

7)jenkins master ui level: relaunch the agent in jenkins and wait several minutes.

my environment:

jenkins: 2.89.2 , os: windows 2012 R2, java: jdk1.8.0_151




回答4:


I experienced the same issue. I found out that the windows slave switched to a "sleep" mode specially if your jobs are not running against a GUI.

  • For windows... no move of the mouse or keyboard means no activity.

Then to successfully solve it. On a Windows7 slave, here is what I did:

  • Control Panel\Hardware and Sound\Power Options
  • Show additionnal plans
  • select High performance

  • Control Panel\Hardware and Sound\Power Options\Edit Plan Settings

  • turn off display never
  • Change advanced power settings -->turn off hard disk after 10000 min

Should be ok after this procedure




回答5:


Well... for me it worked the following solution:

mark the node "temporary offline" and put it back "online" again

reconnect




回答6:


The user2015131 suggestion inspired me to find my solution for this issue.

The problem

I explain my case, it may work for some people:

  1. I installed Jenkins as a service a long time before on my slave machines.
  2. I updated Java on the Jenkins' master computer.

So the Jenkins service's code stored on the slave is outdated.

The solution

Follow the next steps on every slave machine:

  1. Update Java version.

    Be sure the Java version is the same or compatible with the one installed on the master computer.

  2. Remove the old slave code. It's located inside the folder specified in the Remote Root Directory field under the node's configuration.

    I removed every jenkins-slave.* file, leaving only the jenkins_agent.pid file and the folders "remoting" and "workspace".

  3. Go to the slave node interface on Jenkins from the web browser and click on the button.

    You will download a new JNLP file to install a new (updated) Jenkins service on the slave machine.

  4. Run the downloaded file, go to menu and click on "Install as a service".

Hope it helps!




回答7:


No time to breath for virtual slave...

ok, here how I've solved my special case:

I had some VM's with libvirt/quemu running as slaves. Because the libvirt-plugin was to unreliable for me I've started those VM's on my own. I asked my self: "Why this libvirt-plugin had a mandatory delay time... Impatience...

So if the libvirt-client (slave) is saying hello to jenkins you should probably wait some secs to let this poor guy breath a bit. After starting up.

The slave was a win7 the host a ubuntu 18.04




回答8:


On Windows, I recognized that I needed to add the "-noCertificateCheck" attribute to the arguments of the jenkins-slave.xml in the workdir. We use a cert from a internal PKI on the master and this was the easiest way to work around it (having everything in the internal network).

<arguments>-Xrs  -jar "%BASE%\slave.jar" -jnlpUrl https://jenkins.ourdomain.com/computer/Windows%20build%20server%20-%20Bare%20metal/slave-agent.jnlp -secret abc -noCertificateCheck</arguments>

I recognized this by manually running the agent from the command prompt:

java -jar agent.jar -jnlpUrl https://jenkins.ourdomain.com/computer/Windows%20build%20server%20-%20Bare%20metal/slave-agent.jnlp -secret abc -workDir "D:\agentroot" -noCertificateCheck



回答9:


I was facing the same issue , rectified using below steps

  1. Go to Jenkins Url -> Manage Jenkins -> Node -> Your Node ..you will get Custom WorkDir path lets say C:/Jenkins
  2. Open WorkDir Path and delete complete remoting directory
  3. Re-launch the slave


来源:https://stackoverflow.com/questions/45433225/jenkins-windows-slave-connection-getting-terminated-with-java-nio-channels-close

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!