Switching Thread Contexts with SIGALRM

问题

I have a problem. I need to implement a program that switches ucontext threads using a timer and SIGALRM but I am getting a segmentation fault when I switch threads using my evict_thread function. I believe it is the result of a race condition as it occurs at different times durings the programs execution. Here is my evict_thread

void evict_thread(int signal)
{   
// Check that there is more than one thread in the queue
if ((int)list_length(runqueue) > 1)
{
    // Remove the currently executing thread from the runqueue and store its id
    int evict_thread_id = list_shift_int(runqueue);

    // Place the thread at the back of the run queue
    list_append_int(runqueue, evict_thread_id);

    // Get the id of the thread that is now at the head of the run queue
    int exec_thread_id = list_item_int(runqueue, 0);

    // Set the start time for new thread to the current time
    clock_gettime(CLOCK_REALTIME, &thread_table[exec_thread_id]->start);

    printf("Switching context from %s to %s\n",
        thread_table[evict_thread_id]->thread_name,
        thread_table[exec_thread_id]->thread_name);

    // Execute the thread at the head of the run queue
    if (swapcontext(&thread_table[evict_thread_id]->context, &thread_table[exec_thread_id]->context) == -1)
    {
        perror("swapcontext failed\n");
        printf("errno: %d.\n", errno);
        return;
    }   
}
return;     
}

The above function is called in the following manner

// Set the SIGALRM
if (sigset(SIGALRM, evict_thread) == -1)
{
    perror("sigset failed\n");
    printf("errno: %d.\n", errno);
    return;
}

// Initialize timer
thread_switcher.it_interval.tv_sec  = 0;
thread_switcher.it_interval.tv_usec = quantum_size;
thread_switcher.it_value.tv_sec = 0;
thread_switcher.it_value.tv_usec =  quantum_size;
setitimer(ITIMER_REAL, &thread_switcher, 0);

The run queue is simply a global list of integers that are indices into a global table of pointers to the ucontext threads. The list is implemented using the list data structure from a C general utility library available at libslack.org

When I disable the timer and let each thread run to completion before switching contexts the program runs properly, but when the threads are switched during execution I get a segmentation fault around 80% of the time.

Also when I attempt to use gdb to backtrace the segmentation fault its says that it occurs within a systemcall.

回答1:

Remember that signal handlers run asynchronously to your main code. The man 7 signal page is worth a careful read to ensure that you are adhering to the guidelines. For example, in the section Async-signal-safe-functions there is no mention of printf or other functions such as swapcontext. That means you can't reliably call these functions from a signal handler.

In general, try to do as little work in your signal handler as possible. Usually this would just mean setting a flag of type sig_atomic_t in the signal handler, then checking the state of this flag in your main loop.

Perhaps rearrange your code so that the context switching occurs in the main loop, not from a signal handler. You might be able to use sigwait in the main loop to wait for the timer signal.

回答2:

I can't give you any advice on how to make it work, but here's a few points on what's not working:

Signal handlers runs asynchronously regarding your other code. e.g. the signal might kick in when some code is updating your runqueue , and when the signal handler runs list_append_int(runqueue, evict_thread_id); you have a rather serious race condition.

printf() should not be called in a signal handler, it can deadlock or worse. Here's a list of functions that are safe to call in a signal handler. setcontext/swapcontext is not mentioned to be safe to call in a signal handler, though its linux man page says you can call setcontext() in a signal handler - I'm not sure what's authoritative on this.

Also note what the manpage for setcontext() says:

When a signal occurs, the current user context is saved and a new context is created by the kernel for the signal handler.

So when you issue swapcontext(), you might be saving a context of the signal handler, instead of the current context that was running before the signal kicked in.

回答3:

As a guess: you're passing something to the kernel that is not visible from there because you switch context. You are asking about a segfault, but your code is doing interesting things.

Perhaps if you considered a more standard model for thread scheduling you could avoid the problems. Instead of trying to schedule the threads using context switches there other ways to do this. And you could call them from your evict thread, using your exact same current program model.

Some of this suggestion is a bit system specific. If you can tell us what your OS is we can find something that is good for your situation. Or you can check it out for yourself.

Read about POSIX thread scheduling. Pay special attention to SCHED_FIFO, which will work with your model.

https://computing.llnl.gov/tutorials/pthreads/man/sched_setscheduler.txt

This applies generally to using the POSIX thread library to schedule threads, instead of you trying to do it the hard way.

来源：https://stackoverflow.com/questions/15398556/switching-thread-contexts-with-sigalrm

标签

multithreading

race-condition

ucontext