How to loop through only active file descriptors from fd_set result from select()?

问题

So in my current server implementation, it is currently something like this:

  void loop(){
     // step 1: clear set

     fd_set readfds;

     while(true){

        // step 1:
        FD_ZERO(readfds);

        // step 2:
        loop_through_sockets_and_add_active_sockets_to(theset);

        // step 3:
        switch(select(FD_SETSIZE, &readfds, 0, 0, &tv)) {
           case SOCKET_ERROR:
              patia->receiveEvent(Error, net::getError());
              return;
           case 0:
              return;
        }

        // step 4:
        loop through sockets and check, using FD_ISSET, 
        which read fd's have incoming data.

     }
  }

Now, not clearing the fd_set (using FD_SET, FD_CLR when the channels are added/removed only) would be a better way to do things.

My question is, how can you loop through the fd_set after select(), without checking each member of the set if it's part of the set, without using FD_ISSET?

I mean, when you have 4000 active connections, whenever there's incoming data, the above loop will have to go through a potential of 4000 sockets before getting to the right one. The complexity would be n^2 if all the threads are active a lot!

回答1:

My question is, how can you loop through the fd_set after select(), without checking each member of the set if it's part of the set, without using FD_ISSET?

You can't.

There is a slight optimisation that select() returns the number of ready descriptors, so if you keep a count of the number you have processed, you can stop when you know you have done them all without going to the end of the set.

I mean, when you have 4000 active connections, whenever there's incoming data, the above loop will have to go through a potential of 4000 sockets before getting to the right one. The complexity would be n^2 if all the threads are active a lot!

I don't see where you get O(n^2) from. Surely, after returning from select() you'd go through the set once processing each ready descriptor on the way. If you have 4,000 ready IO descriptors, the overhead of looping through an array of 4,000 in memory C objects is going to be fairly insignificant.

回答2:

You can.
This iterates over the set without ISSET()-ing all FDs from an array. But you still have to touch all longs in the fd_set.__fds_bits array.

#include<sys/select.h>
#include<stdio.h>
int main(void)
{
    fd_set fds;
    FD_ZERO(&fds);
    //Fill the set.
    FD_SET(6, &fds);FD_SET(20, &fds);FD_SET(33, &fds);FD_SET(200, &fds);
    int i;
    unsigned long *m = (unsigned long *)__FDS_BITS(&fds);
    int fd=0;
    for (i = 0; i < sizeof (fd_set) / sizeof (unsigned long); ++i) //can use int, long or long long. Using long because internal structure is long.
    {
        fd=sizeof (unsigned long)*i*8;
        while(m[i]!=0)
        {
            fd+=__builtin_ctzl(m[i]); //Get Number of trailing zero bits in long.
            printf("FD=%d\n",fd);
            /*Found FD*/
            m[i]>>=(__builtin_ctzl(m[i]))+1; 
            ++fd;
        }
    }
    return 0;
}

This works fine for me using gcc (SUSE Linux) 4.6.2

Background

on my system a fd_set looks like this (extract and simplified /usr/include/sys/select.h):

typedef struct {
    __fd_mask __fds_bits[__FD_SETSIZE/__NFDBITS];
}

with __fd_mask being a typedef to long int.
__FD_SETSIZE/__NFDBITS seems to be 16 on my system.
So this array is (__FD_SETSIZE/__NFDBITS)*sizeof(__fd_mask)*8 bits.
With __NFDBITS = 8*sizeof(__fd_mask) you see, that this holds __FD_SETSIZE bits.

Looking at the definitons of the actual macros in /usr/include/bits/select.h reveals, that the __fd_bits are used to store the fds. When a fd n is set, the nth bit in __fd_bits is set to 1.

i386(among others) processors have a single operation to count the trailing zero bits of a number. With this amount of trailing zeros you can easily shift the __fd_bits entry by that amount+1 and you will find the next true bit.

In comparison to looping the entry-set of your fd_set you need a minimum of __FD_SETSIZE/__NFDBITS=16 loops, when not optimizing using the return value of select.

But be sure, that your processor and compiler supports the operation for the choosen loop type. When it defaults to NOT using the bit-operation, but a complex implementation it might get worse.

If this is actually better than looping the known fds has to be proved.

回答3:

It's highly likely that you already have a data structure associated with each open file-descriptor that is being select()ed. You'll already need a way of referencing this when demultiplexing the fd_sets returned from select().

If the number of file descriptors is significantly smaller than FD_SETSIZE, you are probably better off iterating over this (e.g. just the file descriptors that are open) and use FD_ISSET() to check for activity.

来源：https://stackoverflow.com/questions/5474232/how-to-loop-through-only-active-file-descriptors-from-fd-set-result-from-select

标签

posix

pipe