I poll a lot of devices in network (more than 300) by iterative ping.
The program polls the devices sequentially, so it\'s slow. I\'d like to enhance the speed of pollin
Personally I would go with IOCP. I'm using that very successfully for the transport implementation in NexusDB.
If you want to perform 300 send/receive cycles using blocking sockets and threads in parallel, you end up needing 300 threads.
With IOCP, after you've associated the sockets with the IOCP, you can perform the 300 send operations, and they will return instantly before the operation is completed. As the operations are completed, so called completion packages will be queued to the IOCP. You then have a pool of threads waiting on the IOCP, and the OS wakes them up as the completion packets come in. In reaction to completed send operations you can then perform the receive operations. The receive operations also return instantly and once actually completed get queued to the IOCP.
The real special thing about an IOCP is that it knows which threads belong to it and are currently processing completion packages. And the IOCP only wakes up new threads if the total number of active threads (not in a kernel mode wait state) is lower than the concurrency number of the IOCP (by default that equals the number of logical cores available on the machine). Also, if there are threads waiting for completion packages on the IOCP (which haven't been started yet despite completion packages being queued because the number of active threads was equal to the concurrancy number), the moment one of the threads that is currently processing a completion package enters a kernel mode wait state for any reason, one of the waiting threads is started.
Threads returning to the IOCP pick up completion packages in LIFO order. That is, if a thread is returning to the IOCP and there are completion packages still waiting, that thread directly picks up the next completion package, instead of being put into a wait state and the thread waiting for the longest time waking up.
Under optimal conditions, you will have a number of threads equal to the number of available cores running concurrently (one on each core), picking up the next completion package, processing it, returning to the IOCP and directly picking up the next completion package, all without ever entering a kernel mode wait state or a thread context switch having to take place.
If you would have 300 threads and blocking operations instead, not only would you waste at least 300 MB address space (for the reserved space for the stacks), but you would also have constant thread context switches as one thread enters a wait state (waiting for a send or receive to complete) and the next thread with a completed send or receive waking up. – Thorsten Engler 12 hours ago