IBM MQ Client disconnect after 10 minutes: IBM.XMS.IllegalStateException

自古美人都是妖i 提交于 2021-02-10 07:07:12

问题


I am using this example from IBM. I have just copied and pasted the code:

https://github.com/ibm-messaging/mq-dev-patterns/blob/master/dotnet/dotNetGet.cs

  • I am connecting to a MQ Server version 9.0.0.5
  • I am using a console application .Net Framework 4.6.1
  • The MQ client installed in my local machines is 9.1.0.1

I can see a very strange behavior. The application run normally and it is able to get messages. But it would disconnect after exactly 10 minutes. It is always 10 minutes.

This is the error catched:

IBM.XMS.IllegalStateException: Failed to get a message from destination MY_QUEUE.
IBM MQ classes for XMS attempted to perform an MQGET; however IBM MQ reported an error.
Use the linked exception to determine the cause of this error.
   at IBM.XMS.Client.Impl.XmsMessageConsumerImpl.ReceiveInboundMessage(Int64 timeout)
   at IBM.XMS.Client.Impl.XmsMessageConsumerImpl.Receive(Int64 millis)
   at Mq_Get_Tests.SimpleConsumer.ReceiveMessages() in C:\Users\osotorrio\Projects\Temporal\Mq_Get_Tests\Mq_Get_Tests\SimpleConsumer.cs:line 137
Linked Exception : CompCode: 2, Reason: 2009*

Is the IBM example missing some configuration settings to allow the client to reconnect after 10 minutes of inactivity?


回答1:


The symptoms you describe appear to be a match for APAR IT26614: MQ dotnet (.NET) client channel ends abnormally when the heartbeat (HBINT) is reached.

The fix is targeted for delivery in the following PTFs:

Version    Maintenance Level
v8.0       8.0.0.13
v9.0 LTS   9.0.0.7
v9.1 CD    9.1.3
v9.1 LTS   9.1.0.3

As of August 7th 2019 9.0.0.7 and 9.1.0.3 have been released and can be downloaded from MQC9: IBM MQ V9 Clients or MQC91: IBM MQ Clients


To give some background on how things are supposed to work:

  1. A MQ client application when it connect to the queue manager will negotiate a heartbeat interval (HBINT) which is a value in seconds. The negotiated HBINT is always the highest value negotiated between the SVRCONN and the client application.
    Note: A SVRCONN HBINT has a default value of 300.
  2. Based on the HBINT, a TIMEOUT is calculated in one of two ways:
    1. If the negotiated HBINT is less than 60 the TIMEOUT is 2x HBINT. (the received timeout is HBINT seconds after the HBINT has passed)
    2. If the negotiated HBINT is greater than or equal to 60 the TIMEOUT is HBINT + 60. (the receive timeout is 60 seconds after the HBINT has passed).
  3. If no normal traffic has been received from the queue manager in the HBINT amount of time the client should send a Heart Beat to the queue manager which should respond. The client should wait the receive timeout amount of time for the Heart Beat to be received.
  4. The queue manager can also initiate a Heart Beat to the client, but to prevent extra traffic the queue manager waits five seconds more than the negotiated HBINT before sending a Heart Beat to the client.

APAR IT26614 corrects the following three issues:

  1. In either Unmanaged or Managed mode it is documented that if you are not using a CCDT the HBINT will use the value of the SVRCONN channel. In reality if not using a CCDT the HBINT on the client side defaults to 300 so this is the lowest HBINT you will see.

  2. Specific to Managed .NET the client side HBINT cannot be lower than the SVRCONN HBINT the connection will fail with a 2059. This problem impacts both with or without CCDT.

    • with a CCDT you are unable to set the CLNTCONN HBINT to a value less than the SVRCONN HBINT
    • without a CCDT you will be impacted if the SVRCONN HBINT is set to 301 or higher
  3. Specific to Managed .NET the client side receive timeout was being calculated in milliseconds not seconds. In this case the defect has been present according to IBM for a long time, but did not present itself until APAR IT16167: Managed .NET client application does not send heartbeat request to queue manager was introduced in 8.0.0.10 and 9.0.0.4 (IBM also confirmed this is present in GA 9.1.0.0). The reason it was not previously a problem was that Managed .NET was never initiating the Heart Beat, the queue manager would always send the Heart Beat at HBINT + 5 seconds and the .NET client would respond. Once this was corrected, the miscalculation of the receive timeout presented itself.


Based on Managed XMS.NET client version 9.1.0.1 this is what I suspect is happening:

  1. The HBINT is negotiated to 300 seconds no matter what the SVRCONN HBINT has set.
  2. The Managed XMS.NET client will send a Heart Beat to the queue manager after having not received anything from the queue manager for 300 seconds.
  3. At this point the Managed XMS.NET client will only wait 60ms for a response from the queue manager.
  4. If the Managed XMS.NET client does not receive a response in 60ms it will return a 2009 error to the application.
  5. The queue manager error logs will show a AMQ9209 with "An error occurred receiving data from 'dnsname (xx.xx.xx.xx)' over TCP/IP.

You mention seeing this only at 10 minutes (600 seconds), but I have seen it at any 300 second interval based on the latency of the network. If you are connecting to a queue manager on the same server or in the same local network segment you may never see this problem. If you are connecting over a high latency WAN circuit it may be experienced every 300 seconds. If you are connected over a segment with close to 30ms you may see it intermittently.

I suggest you try out the 9.0.0.7 or 9.1.0.3 Managed XMS.NET client and see if it resolves the problem for you since at these releases it will wait a full 60 seconds for the Heart Beat response from the queue manager.


If you want to add reconnect to the sample which would mask the issue if not using a version of MQ that includes APAR IT26614, you can use the following settings:

cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_OPTIONS, XMSC.WMQ_CLIENT_RECONNECT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT, XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT_DEFAULT);
//Note that XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT_DEFAULT is 1800

Note even if you use a version of MQ with APAR IT26614, the above is a good practice as it will tell the Managed XMS.NET client to automatically attempt to reconnect to the queue manager for XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT seconds if the connection is lost. Reconnect does not apply to the initial connection to the queue manager, it only applies after you are connected.



来源:https://stackoverflow.com/questions/56937216/ibm-mq-client-disconnect-after-10-minutes-ibm-xms-illegalstateexception

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!