High availability computing: How to deal with a non-returning system call, without risking false positives?

前端未结

关注

 3  437

I have a process that\'s running on a Linux computer as part of a high-availability system. The process has a main thread that receives requests from the other computers on the

相关标签:

3条回答

暖寄归人

2021-01-27 00:42

I think you need a shared activity marker.

Have the main thread (or in a more general application, all worker threads) update the shared activity marker with the current time (or clock tick, e.g. by computing the "current" nanosecond from clock_gettime(CLOCK_MONOTONIC, ...)), and have the heartbeat thread periodically check when this activity marker was last updated, cancelling itself (and thus stopping the heartbeat broadcast) if there has not been any activity update within a reasonable time.

This scheme can easily be extended with a state flag if the workload is very sporadic. The main work thread sets the flag and updates the activity marker when it begins a unit of work, and clears the flag when the work has completed. If there is no work being done then the heartbeat is sent without checking the activity marker. If work is being done then the heartbeat is stopped if the time since the activity marker was updated exceeds the maximum processing time allowed for a unit of work. (Multiple worker threads each need their own activity marker and flag in this case, and the heartbeat thread can be designed to stop when any one worker thread gets stuck, or only when all worker threads get stuck, depending on their purposes and importance to the overall system).

(The activity marker value (and the work flag) will of course have to be protected by a mutex that must be acquired before reading or writing the value.)

Perhaps the heartbeat thread can also cause the whole process to commit suicide (e.g. kill(getpid(), SIGQUIT)) so that it can be restarted by having it be called in a loop in a wrapper script, especially if a process restart clears the condition in the kernel which would cause the problem in the first place.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2021-01-27 01:01

One possible method would be to have another set of heartbeat messages from the main thread to the heartbeat thread. If it stops receiving messages for a certain amount of time, it stops sending them out as well. (And could try other recovery such as restarting the process.)

To solve the issue of the main thread actually just being in a long sleep, have a (properly-synchronized) flag that the heartbeat thread sets when it has decided that the main thread must have failed - and the main thread should check this flag at appropriate times (e.g. after the potential wait) to make sure that it hasn't been reported as dead. If it has, it stops running, because its job would have already been taken up by a different node.

The main thread can also send I-am-alive events to the heartbeat thread at other times than once around the loop - for example, if it's going into a long-running operation. Without this, there's no way to tell the difference between a failed main thread and a sleeping main thread.

0 讨论(0)
发布评论:

提交评论
- 加载中...

暗喜

2021-01-27 01:03

My second suggestion is to use ptrace to find the current instruction pointer. You can have a parent thread that ptraces your process and interrupts it every second to check the current RIP value. This is somewhat complex, so I've written a demonstration program: (x86_64 only, but that should be fixable by changing the register names.)

#define _GNU_SOURCE
#include <unistd.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/syscall.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <linux/ptrace.h>
#include <sys/user.h>
#include <time.h>

// this number is arbitrary - find a better one.
#define STACK_SIZE (1024 * 1024)

int main_thread(void *ptr) {
    // "main" thread is now running under the monitor
    printf("Hello from main!");
    while (1) {
        int c = getchar();
        if (c == EOF) { break; }
        nanosleep(&(struct timespec) {0, 200 * 1000 * 1000}, NULL);
        putchar(c);
    }
    return 0;
}

int main(int argc, char *argv[]) {
    void *vstack = malloc(STACK_SIZE);
    pid_t v;
    if (clone(main_thread, vstack + STACK_SIZE, CLONE_PARENT_SETTID | CLONE_FILES | CLONE_FS | CLONE_IO, NULL, &v) == -1) { // you'll want to check these flags
        perror("failed to spawn child task");
        return 3;
    }
    printf("Target: %d; %d\n", v, getpid());
    long ptv = ptrace(PTRACE_SEIZE, v, NULL, NULL);
    if (ptv == -1) {
        perror("failed monitor sieze");
        exit(1);
    }
    struct user_regs_struct regs;
    fprintf(stderr, "beginning monitor...\n");
    while (1) {
        sleep(1);
        long ptv = ptrace(PTRACE_INTERRUPT, v, NULL, NULL);
        if (ptv == -1) {
            perror("failed to interrupt main thread");
            break;
        }
        int status;
        if (waitpid(v, &status, __WCLONE) == -1) {
            perror("target wait failed");
            break;
        }
        if (!WIFSTOPPED(status)) { // this section is messy. do it better.
            fputs("target wait went wrong", stderr);
            break;
        }
        if ((status >> 8) != (SIGTRAP | PTRACE_EVENT_STOP << 8)) {
            fputs("target wait went wrong (2)", stderr);
            break;
        }
        ptv = ptrace(PTRACE_GETREGS, v, NULL, &regs);
        if (ptv == -1) {
            perror("failed to peek at registers of thread");
            break;
        }
        fprintf(stderr, "%d -> RIP %x RSP %x\n", time(NULL), regs.rip, regs.rsp);
        ptv = ptrace(PTRACE_CONT, v, NULL, NULL);
        if (ptv == -1) {
            perror("failed to resume main thread");
            break;
        }
    }
    return 2;
}

Note that this is not production-quality code. You'll need to do a bunch of fixing things up.

Based on this, you should be able to figure out whether or not the program counter is advancing, and could combine this with other pieces of information (such as /proc/PID/status) to find if it's busy in a system call. You might also be able to extend the usage of ptrace to check what system calls are being used, so that you can check if it's a reasonable one to be waiting on.

This is a hacky solution, but I don't think that you'll find a non-hacky solution for this problem. Despite the hackiness, I don't think (this is untested) that it would be particularly slow; my implementation pauses the monitored thread once per second for a very short amount of time - which I would guess would be in the 100s of microseconds range. That's around 0.01% efficiency loss, theoretically.

0 讨论(0)