问题
I'm new with semaphores and want to add multithreading to my program, but I cannot get around the following problem: sem_wait() should be able to receive a EINTR and unblock, as long as I didn't set the SA_RESTART flag. I am sending a SIGUSR1 to the worker thread that is blocking in sem_wait(), it does receive the signal and get interrupted, but it will then continue to block and so it will never give me a -1 return code together with errno = EINTR. However, if I do a sem_post from the main thread, it will unblock, give me an errno of EINTR but a RC of 0. I am totally puzzled with this behavior. Is it some weird NetBSD implementation or am I doing something wrong here? According to the man page, sem_wait is conform POSIX.1 (ISO/IEC 9945-1:1996). A simple code:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <signal.h>
#include <pthread.h>
#include <semaphore.h>
typedef struct workQueue_s
{
int full;
int empty;
sem_t work;
int sock_c[10];
} workQueue_t;
void signal_handler( int sig )
{
switch( sig )
{
case SIGUSR1:
printf( "Signal: I am pthread %p\n", pthread_self() );
break;
}
}
extern int errno;
workQueue_t queue;
pthread_t workerbees[8];
void *BeeWork( void *t )
{
int RC;
pthread_t tid;
struct sigaction sa;
sa.sa_handler = signal_handler;
sigaction( SIGUSR1, &sa, NULL );
printf( "Bee: I am pthread %p\n", pthread_self() );
RC = sem_wait( &queue.work );
printf( "Bee: got RC = %d and errno = %d\n", RC, errno );
RC = sem_wait( &queue.work );
printf( "Bee: got RC = %d and errno = %d\n", RC, errno );
pthread_exit( ( void * ) t );
}
int main()
{
int RC;
long tid = 0;
pthread_attr_t attr;
pthread_attr_init( &attr );
pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_JOINABLE );
queue.full = 0;
queue.empty = 0;
sem_init( &queue.work, 0, 0 );
printf( "I am pthread %p\n", pthread_self() );
pthread_create( &workerbees[tid], &attr, BeeWork, ( void * ) tid );
pthread_attr_destroy( &attr );
sleep( 2 );
sem_post( &queue.work );
sleep( 2 );
pthread_kill( workerbees[tid], SIGUSR1 );
sleep( 2 );
// Remove this and sem_wait will stay blocked
sem_post( &queue.work );
sleep( 2 );
return( 0 );
}
I know the printf is not aloud in the signal handler, but just for the heck of it, if I remove it I get the same results.
These are the results without sem_post:
I am pthread 0x7f7fffc00000
Bee: I am pthread 0x7f7ff6c00000
Bee: got RC = 0 and errno = 0
Signal: I am pthread 0x7f7ff6c00000
And with the sem_post:
I am pthread 0x7f7fffc00000
Bee: I am pthread 0x7f7ff6c00000
Bee: got RC = 0 and errno = 0
Signal: I am pthread 0x7f7ff6c00000
Bee: got RC = 0 and errno = 4
I know I don't really need to unblock and can simply do an exit from main, but I want to see it working anyway. The reason I'm using sem_wait is because I want to keep the worker threads alive and wake the one up waiting the longest from the main thread with sem_post, as soon as there is a new client connection from Postfix. I don't want to do pthread_create all the time, since I will receive calls multiple times per second and I don't want to lose speed and make Postfix unresponsive to new smtpd clients. It is a policydaemon for Postfix and the server is quite busy.
Am I missing something here? Is NetBSD just messed up with this?
回答1:
My post is about behaviour on Linux, but I think you may have similar behaviour, or at least I thought could be helpful. If not, let me know, I'll remove this useless 'noise'.
I tried to reproduce your setup and I was quite astonished of seeing what you describe happening. Looking deeper helped me figure out that there was actually something more subtil; if you have a look to strace, you'll see somthing like:
[pid 6984] futex(0x6020e8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 6983] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 6983] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid 6983] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 6983] nanosleep({2, 0}, 0x7fffe5794a70) = 0
[pid 6983] tgkill(6983, 6984, SIGUSR1 <unfinished ...>
[pid 6984] <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 6983] <... tgkill resumed> ) = 0
[pid 6984] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=6983, si_uid=500} ---
[pid 6983] rt_sigprocmask(SIG_BLOCK, [CHLD], <unfinished ...>
[pid 6984] rt_sigreturn( <unfinished ...>
[pid 6983] <... rt_sigprocmask resumed> [], 8) = 0
[pid 6984] <... rt_sigreturn resumed> ) = -1 EINTR (Interrupted system call)
see the lines with ERESTARTSYS
and the EINTR
: the sistem call being interrupted is actually rt_sigreturn resumed
, not futex
(the system call underlying the sem_wait) as you expected.
I must say I was quite puzzled but reading the man gave some interesting clues (man 7 signal):
If a blocked call to one of the following interfaces is interrupted by
a signal handler, then the call will be automatically restarted after
the signal handler returns if the SA_RESTART flag was used; otherwise
the call will fail with the error EINTR:
[...]
* futex(2) FUTEX_WAIT (since Linux 2.6.22; beforehand, always
failed with EINTR).
So I guess you have a kernel that has a similar behaviour (see netBSD doc?) and you can observe that the system call automatically restart without any chance for you to see it.
That said, I completely removed the sem_post() from your program and just sent signal to 'break' the sem_wait() ans looking at strace I saw (filtering on the bee thread):
[pid 8309] futex(0x7fffc0470990, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 8309] <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 8309] --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=8308, si_uid=500} ---
[pid 8309] rt_sigreturn() = -1 EINTR (Interrupted system call)
[pid 8309] madvise(0x7fd5f6019000, 8368128, MADV_DONTNEED) = 0
[pid 8309] _exit(0)
I must say I don't master the details, but the kernel seems to find out where I'm trying to stand and make the whole thing have the correct behaviour:
Bee: got RC = -1 and errno = Interrupted system call
回答2:
Thanks for your answer OznOg, if I remove the last sem_post and make the last sleep a little longer, I get this with ktrace:
PSIG SIGUSR1 caught handler=0x40035c mask=(): code=SI_LWP sent by pid=10631, uid=0)
CALL write(1,0x7f7ff7e04000,0x24)
GIO fd 1 wrote 36 bytes "Signal: I am pthread 0x7f7ff7800000\n"
RET write 36/0x24
CALL setcontext(0x7f7ff7bff970)
RET setcontext JUSTRETURN
CALL ___lwp_park50(0,0,0x7f7ff7e01100,0x7f7ff7e01100)
RET __nanosleep50 0
CALL exit(0)
RET ___lwp_park50 -1 errno 4 Interrupted system call
Seems like sem_wait will only return by either an exit or a sem_post....
来源:https://stackoverflow.com/questions/33853901/sem-wait-not-unblocking-with-eintr