I have a java app on linux which opens UDP socket and waits for messages.
After couple of hours under heavy load, there is a packet loss, i.e. the packets are receiv
rx_queue will tell you the queue length at any given instant, but it will not tell you how full the queue has been, i.e. the highwater mark. There is no way to constantly monitor this value, and no way to get it programmatically (see How do I get amount of queued data for UDP socket?).
The only way I can imagine monitoring the queue length is to move the queue into your own program. In other words, start two threads -- one is reading the socket as fast as it can and dumping the datagrams into your queue; and the other one is your program pulling from this queue and processing the packets. This of course assumes that you can assure each thread is on a separate CPU. Now you can monitor the length of your own queue and keep track of the highwater mark.
The process is simple:
If desired, pause the application process.
Open the UDP socket. You can snag it from the running process using /proc/<PID>/fd
if necessary. Or you can add this code to the application itself and send it a signal -- it will already have the socket open, of course.
Call recvmsg
in a tight loop as quickly as possible.
Count how many packets/bytes you got.
This will discard any datagrams currently buffered, but if that breaks your application, your application was already broken.
Linux provides the files /proc/net/udp
and /proc/net/udp6
, which lists all open UDP sockets (for IPv4 and IPv6, respectively). In both of them, the columns tx_queue
and rx_queue
show the outgoing and incoming queues in bytes.
If everything is working as expected, you usually will not see any value different than zero in those two columns: as soon as your application generates packets they are sent through the network, and as soon those packets arrive from the network your application will wake up and receive them (the recv
call immediately returns). You may see the rx_queue
go up if your application has the socket open but is not invoking recv
to receive the data, or if it is not processing such data fast enough.
UDP is a perfectly viable protocol. It is the same old case of the right tool for the right job!
If you have a program that waits for UDP datagrams, and then goes off to process them before returning to wait for another, then your elapsed processing time needs to always be faster than the worst case arrival rate of datagrams. If it is not, then the UDP socket receive queue will begin to fill.
This can be tolerated for short bursts. The queue does exactly what it is supposed to do – queue datagrams until you are ready. But if the average arrival rate regularly causes a backlog in the queue, it is time to redesign your program. There are two main choices here: reduce the elapsed processing time via crafty programming techniques, and/or multi-thread your program. Load balancing across multiple instances of your program may also be employed.
As mentioned, on Linux you can examine the proc filesystem to get status about what UDP is up to. For example, if I cat
the /proc/net/udp
node, I get something like this:
$ cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 3466 2 ffff88013abc8340 0
67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000 1006 0 16940862 2 ffff88013abc9040 2237
122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000 1006 0 912865 2 ffff88013abc8d00 0
From this, I can see that a socket owned by user id 1006, is listening on port 0x231D (8989) and that the receive queue is at about 128KB. As 128KB is the max size on my system, this tells me my program is woefully weak at keeping up with the arriving datagrams. There have been 2237 drops so far, meaning the UDP layer cannot put any more datagrams into the socket queue, and must drop them.
You could watch your program's behaviour over time e.g. using:
watch -d 'cat /proc/net/udp|grep 00000000:231D'
Note also that the netstat command does about the same thing: netstat -c --udp -an
My solution for my weenie program, will be to multi-thread.
Cheers!