How to monitor Linux UDP buffer available space?

前端 未结 4 690
别那么骄傲
别那么骄傲 2020-11-29 18:11

I have a java app on linux which opens UDP socket and waits for messages.

After couple of hours under heavy load, there is a packet loss, i.e. the packets are receiv

相关标签:
4条回答
  • 2020-11-29 18:45

    rx_queue will tell you the queue length at any given instant, but it will not tell you how full the queue has been, i.e. the highwater mark. There is no way to constantly monitor this value, and no way to get it programmatically (see How do I get amount of queued data for UDP socket?).

    The only way I can imagine monitoring the queue length is to move the queue into your own program. In other words, start two threads -- one is reading the socket as fast as it can and dumping the datagrams into your queue; and the other one is your program pulling from this queue and processing the packets. This of course assumes that you can assure each thread is on a separate CPU. Now you can monitor the length of your own queue and keep track of the highwater mark.

    0 讨论(0)
  • 2020-11-29 18:47

    The process is simple:

    1. If desired, pause the application process.

    2. Open the UDP socket. You can snag it from the running process using /proc/<PID>/fd if necessary. Or you can add this code to the application itself and send it a signal -- it will already have the socket open, of course.

    3. Call recvmsg in a tight loop as quickly as possible.

    4. Count how many packets/bytes you got.

    This will discard any datagrams currently buffered, but if that breaks your application, your application was already broken.

    0 讨论(0)
  • 2020-11-29 18:57

    Linux provides the files /proc/net/udp and /proc/net/udp6, which lists all open UDP sockets (for IPv4 and IPv6, respectively). In both of them, the columns tx_queue and rx_queue show the outgoing and incoming queues in bytes.

    If everything is working as expected, you usually will not see any value different than zero in those two columns: as soon as your application generates packets they are sent through the network, and as soon those packets arrive from the network your application will wake up and receive them (the recv call immediately returns). You may see the rx_queue go up if your application has the socket open but is not invoking recv to receive the data, or if it is not processing such data fast enough.

    0 讨论(0)
  • 2020-11-29 19:06

    UDP is a perfectly viable protocol. It is the same old case of the right tool for the right job!

    If you have a program that waits for UDP datagrams, and then goes off to process them before returning to wait for another, then your elapsed processing time needs to always be faster than the worst case arrival rate of datagrams. If it is not, then the UDP socket receive queue will begin to fill.

    This can be tolerated for short bursts. The queue does exactly what it is supposed to do – queue datagrams until you are ready. But if the average arrival rate regularly causes a backlog in the queue, it is time to redesign your program. There are two main choices here: reduce the elapsed processing time via crafty programming techniques, and/or multi-thread your program. Load balancing across multiple instances of your program may also be employed.

    As mentioned, on Linux you can examine the proc filesystem to get status about what UDP is up to. For example, if I cat the /proc/net/udp node, I get something like this:

    $ cat /proc/net/udp   
      sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops             
      40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 3466 2 ffff88013abc8340 0           
      67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000  1006        0 16940862 2 ffff88013abc9040 2237    
     122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000  1006        0 912865 2 ffff88013abc8d00 0         
    

    From this, I can see that a socket owned by user id 1006, is listening on port 0x231D (8989) and that the receive queue is at about 128KB. As 128KB is the max size on my system, this tells me my program is woefully weak at keeping up with the arriving datagrams. There have been 2237 drops so far, meaning the UDP layer cannot put any more datagrams into the socket queue, and must drop them.

    You could watch your program's behaviour over time e.g. using:

    watch -d 'cat /proc/net/udp|grep 00000000:231D'
    

    Note also that the netstat command does about the same thing: netstat -c --udp -an

    My solution for my weenie program, will be to multi-thread.

    Cheers!

    0 讨论(0)
提交回复
热议问题