Linux Socket: How to detect disconnected network in a client program?

若如初见. 提交于 2019-11-29 23:18:00

But if I unplug the ethernet cable, the send function still return positive values rather than -1.

First of all you should know send doesn't actually send anything, it's just a memory-copying function/system call. It copies data from your process to the kernel - sometime later the kernel will fetch that data and send it to the other side after packaging it in segments and packets. Therefore send can only return an error if:

  • The socket is invalid (for example bogus file descriptor)
  • The connection is clearly invalid, for example it hasn't been established or has already been terminated in some way (FIN, RST, timeout - see below)
  • There's no more room to copy the data

The main point is that send doesn't send anything and therefore its return code doesn't tell you anything about data actually reaching the other side.

Back to your question, when TCP sends data it expects a valid acknowledgement in a reasonable amount of time. If it doesn't get one, it resends. How often does it resend ? Each TCP stack does things differently, but the norm is to use exponential backoffs. That is, first wait 1 second, then 2, then 4 and so on. On some stacks this process can take minutes.

The main point is that in the case of an interruption TCP will declare a connection dead only after a seriously large period of silence (on Linux it does something like 15 retries - more than 5 minutes).

One way to solve this is to implement some acknowledgement mechanism in your application. You could for example send a request to the server "reply within 5 seconds or I'll declare this connection dead" and then recv with a timeout.

Forhad Ahmed

To detect a remote-disconnect, do a read()

Check this thread for more info:

Can read() function on a connected socket return zero bytes?

cloudrain21

You can't detect the unplugged ethernet cable only with calling write() funcation. That's because of tcp retransmission acted by tcp stack without your consciousness. Here are solutions.

Even though you already set keepalive option to your application socket, you can't detect in time the dead connection state of the socket, in case of your app keeps writing on the socket. That's because of tcp retransmission by the kernel tcp stack. tcp_retries1 and tcp_retries2 are kernel parameters for configuring tcp retransmission timeout. It's hard to predict precise time of retransmission timeout because it's calculated by RTT mechanism. You can see this computation in rfc793. (3.7. Data Communication)

https://www.rfc-editor.org/rfc/rfc793.txt

Each platforms have kernel configurations for tcp retransmission.

Linux : tcp_retries1, tcp_retries2 : (exist in /proc/sys/net/ipv4)

http://linux.die.net/man/7/tcp

HPUX : tcp_ip_notify_interval, tcp_ip_abort_interval

http://www.hpuxtips.es/?q=node/53

AIX : rto_low, rto_high, rto_length, rto_limit

http://www-903.ibm.com/kr/event/download/200804_324_swma/socket.pdf

You should set lower value for tcp_retries2 (default 15) if you want to early detect dead connection, but it's not precise time as I already said. In addition, currently you can't set those values only for single socket. Those are global kernel parameters. There was some trial to apply tcp retransmission socket option for single socket(http://patchwork.ozlabs.org/patch/55236/), but I don't think it was applied into kernel mainline. I can't find those options definition in system header files.

For reference, you can monitor your keepalive socket option through 'netstat --timers' like below. https://stackoverflow.com/questions/34914278

netstat -c --timer | grep "192.0.0.1:43245             192.0.68.1:49742"

tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (1.92/0/0)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (0.71/0/0)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (9.46/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (8.30/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (7.14/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (5.98/0/1)
tcp        0      0 192.0.0.1:43245             192.0.68.1:49742            ESTABLISHED keepalive (4.82/0/1)

In addition, when keepalive timeout ocurrs, you can meet different return events depending on platforms you use, so you must not decide dead connection status only by return events. For example, HP returns POLLERR event and AIX returns just POLLIN event when keepalive timeout occurs. You will meet ETIMEDOUT error in recv() call at that time.

In recent kernel version(since 2.6.37), you can use TCP_USER_TIMEOUT option will work well. This option can be used for single socket.

Finally, you can use read function with MSG_PEEK flag, which can let you check that the socket is okay. (MSG_PEEK just peeks if data arrived at kernel stack buffer and never copies the data into user buffer.) So you can use this flag just for checking socket is okay without any side effect.

Check the return value, and see if it's equal to this value:

EPIPE
This socket was connected but the connection is now broken. In this case, send generates a SIGPIPE signal first; if that signal is ignored or blocked, or if its handler returns, then send fails with EPIPE.

Also add a check for the SIGPIPE signal in your handler, to make it be more controllable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!