Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?

问题

After writing an answer about TCP_NODELAY and TCP_CORK, I realized that my knowledge of TCP_CORK's finer points must be lacking, since it's not 100% clear to me why the Linux developers felt it necessary to introduce a new TCP_CORK flag, rather than just relying on the application to set or clear the existing TCP_NODELAY flag at the appropriate times.

In particular, if I have a Linux application that wants to send() some small/non-contiguous fragments of data over a TCP stream without paying the 200mS Nagle latency-tax, and at the same time minimize the number of packets needed to send it, I can do it either of these two ways:

With TCP_CORK (pseudocode):

int optval = 1;
setsockopt(sk, SOL_TCP, TCP_CORK, &optval, sizeof(int));   // put a cork in it
send(sk, ..);
send(sk, ..);
send(sk, ..);
optval = 0;
setsockopt(sk, SOL_TCP, TCP_CORK, &optval, sizeof(int));   // release the cork

or with TCP_NODELAY (pseudocode):

int optval = 0;
setsockopt(sk, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(int));   // turn on Nagle's
send(sk, ..);
send(sk, ..);
send(sk, ..);
optval = 1;
setsockopt(sk, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(int));   // turn Nagle's back off

I've been using the latter technique for years with good results, and it has the benefit of being portable to non-Linux OS's as well (although outside of Linux you have to call send() again after turning Nagle's back off, in order to ensure the packets get sent immediately and avoid the Nagle-delay -- send()'ing zero bytes is sufficient).

Now the Linux devs are smart guys, so I doubt that the above usage of TCP_NODELAY never occurred to them. There must be some reason why they felt it was insufficient, which led them to introduce a new/proprietary TCP_CORK flag instead. Can anybody explain what that reason was?

回答1:

You have two questions:

Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?
There must be some reason why they felt it was insufficient, which led them to introduce a new/proprietary TCP_CORK flag instead. Can anybody explain what that reason was?

First see the answers in this Stack Overflow Question, because the are related in the since that question generally describes the difference between the two without reference to your usecase.

TCP_NODELAY ON means send the data (partial frames) the moment you get, regardless if you have enough frames for a full network packet.
TCP_NODELAY OFF means Nagles Algoritm which means send the data when it is bigger than the MSS or waiting for the receiving acknowledgement before sending data which is smaller.
TCP_CORK ON means don't send any data (partial frames) smaller than the MSS until the application says so or until 200ms later.
TCP_CORK OFF means send all the data (partial frames) now.

This means in your given use case in the first example no partial frames are sent until the end, but in your second example partial frames with a receiving acknowledgement will be sent.

Also the final send in your first example , Nagle's algorithm still applies to the partial frames after the uncorking , where as in the second example it doesn't.

The short version is the TCP_NODELAY sends doesn't accumulate the logical packets before sending then as network packets, Nagle's algorithm does according the algorithm, and TCP_CORK does according to the application setting it.

A side effect of this is that Nagle's algorithm will send partial frames on an idle connection, TCP_CORK won't.

Additionally TCP_CORK was introduced into the Linux Kernel in 2.2 (specifically 2.1.127 see here), but until 2.5.71 it was mutually exclusive with TCP_NODELAY. E.g In 2.4 kernels you could use one or the other, but in 2.6 you can combine the two, and TCP_CORK will take precedence when it is applied.

Regarding your second question.

To quote Linus Torvalds

Now, TCP_CORK is basically me telling David Miller that I refuse to play games to have good packet size distribution, and that I wanted a way for the application to just tell the OS: I want big packets, please wait until you get enough data from me that you can make big packets.

Basically, TCP_CORK is a kind of "anti-nagle" flag. It's the reverse of "no-nagle".

Another quote also by Linus is regarding usage of TCP_CORK is the following

Basically, TCP_CORK is useful whenever the server knows the patterns of its bulk transfers. Which is just about 100% of the time with any kind of file serving.

For more quotes see the link with Sendfile Mailing List Discussion.

In summary, in addition to TCP_MAXSEG and MSGMORE when calling writev, TCP_CORK is another tool which allows the application in userspace to have more fine grained control over packet size distribution.

References and further reading

Earthquaky kernel interfaces
Sendfile Kernel Mailing Discussion (where the quote comes from)
TCP/IP options for high-performance data transmission
Rethinking the TCP Nagle Algorithm
TCP_CORK: More than you ever wanted to know
The C10K problem
TCP man page
The Linux Programming Interface Page 1262

来源：https://stackoverflow.com/questions/22124098/is-there-any-significant-difference-between-tcp-cork-and-tcp-nodelay-in-this-use

标签

Linux

tcp

nagle