How do I recover a semaphore when the process that decremented it to zero crashes?

后端 未结 8 535
轻奢々
轻奢々 2020-12-02 13:23

I have multiple apps compiled with g++, running in Ubuntu. I\'m using named semaphores to co-ordinate between different processes.

All works fine except in

相关标签:
8条回答
  • 2020-12-02 13:32

    This is a typical problem when managing semaphores. Some programs use a single process to manage the initialization/deletion of the semaphore. Usually this process does just this and nothing else. Your other applications can wait until the semaphore is available. I've seen this done with the SYSV type API, but not with POSIX. Similar to what 'Duck' mentioned, using the SEM_UNDO flag in your semop() call.


    But, with the information that you've provided I would suggest that you do not to use semaphores. Especially if your process is in danger of being killed or crashing. Try to use something that the OS will cleanup automagically for you.

    0 讨论(0)
  • 2020-12-02 13:33

    If you use a named semaphore, then you can use an algorithm like the one used in lsof or fuser.

    Take these in your consideration:

    1.Each named POSIX semaphore creates a file in a tmpfs file system usually under the path:

    /dev/shm/
    

    2.Each process has a map_files in linux, under the path:

    /proc/[PID]/map_files/
    

    These map files, shows which part of a process memory map to what!

    So using these steps, you can find whether the named semaphore is still opened by another process or not:

    1- (Optional) Find the exact path of named semaphore (In case its not under /dev/shm)

    • First open the named semaphore in the new process and assign the result to a pointer
    • Find the address location of the pointer in the memory (usually with a casting of the address of the pointer to in integer type) and convert it to hexadecimal (i.e result: 0xffff1234) number and then use this path:

      /proc/self/map_files/ffff1234-*

      there should be only one file that fulfills this criteria.

    • Get the symbolic link target of that file. It is the full path of the named semaphore.

    2- Iterate over all processes to find a map file that its symbolic link taget matches the full path of the named semaphore. If there is one, then the semaphore is in real use, but if there is none, then you can safely unlink the named semaphore and reopen it again for your usage.

    UPDATE

    In step 2, when iterating over all processes, instead of iterating over all files in the folder map_file, it is beter to use the file /proc/[PID]/maps and search the full path of the named semaphore file (i.e: /dev/shm/sem_xyz) inside it. In this approach, even if some other programs unlinked the named semaphore but the semaphore is still using in other processes, it still can be found but a flag of "(deleted)" is appended at the end of its file path.

    0 讨论(0)
  • 2020-12-02 13:36

    You'll need to double check but I believe sem_post can be called from a signal handler. If you are able to catch some of the situations that are bringing down the process this might help.

    Unlike a mutex any process or thread (with permissions) can post to the semaphore. You can write a simple utility to reset it. Presumably you know when your system has deadlocked. You can bring it down and run the utility program.

    Also the semaphone is usually listed under /dev/shm and you can remove it.

    SysV semaphores are more accommodating for this scenario. You can specify SEM_UNDO, in which the system will back out changes to the semaphore made by a process if it dies. They also have the ability to tell you the last process id to alter the semaphore.

    0 讨论(0)
  • 2020-12-02 13:37

    Turns out there isn't a way to reliably recover the semaphore. Sure, anyone can post_sem() to the named semaphore to get the count to increase past zero again, but how to tell when such a recovery is needed? The API provided is too limited and doesn't indicate in any way when this has happened.

    Beware of the ipc tools also available -- the common tools ipcmk, ipcrm, and ipcs are only for the outdated SysV semaphores. They specifically do not work with the new POSIX semaphores.

    But it looks like there are other things that can be used to lock things, which the operating system does automatically release when an application dies in a way that cannot be caught in a signal handler. Two examples: a listening socket bound to a particular port, or a lock on a specific file.

    I decided the lock on a file is the solution I needed. So instead of a sem_wait() and sem_post() call, I'm using:

    lockf( fd, F_LOCK, 0 )
    

    and

    lockf( fd, F_ULOCK, 0 )
    

    When the application exits in any way, the file is automatically closed which also releases the file lock. Other client apps waiting for the "semaphore" are then free to proceed as expected.

    Thanks for the help, guys.

    0 讨论(0)
  • 2020-12-02 13:43

    If the process was KILLed then there won't be any direct way to determine that it has gone away.

    You could operate some kind of periodic integrity check across all the semaphores you have - use semctl (cmd=GETPID) to find the PID for the last process that touched each semaphore in the state you describe, then check whether that process is still around. If not, perform clean up.

    0 讨论(0)
  • 2020-12-02 13:44

    You should be able to find it from the shell using lsof. Then possibly you can delete it?

    Update

    Ah yes... man -k semaphore to the rescue.

    It seems you can use ipcrm to get rid of a semaphore. Seems you aren't the first with this problem.

    0 讨论(0)
提交回复
热议问题