How to know if a process had been started but crashed in Linux

问题

Consider the following situation: - I am using Linux. I have doubt that my application has crashed. I had not enabled core dump. There is no information in the log.

How can I be sure that, after the system restart my app was started, but now it is not running, because it has crashed.

My app is configured as a service, written in C/C++.

In a way: how can I get all the process/service names that have executed since the system start? Is it even possible?

I know, I can enable logging and start the process again to get the crash.

回答1:

Standard practice is to have a pid file for your daemon (/var/run/$NAME.pid), in which you can find its process id without having to parse the process tree manually. You can then either check the state of that process, or make your daemon respond to a signal (usually SIGHUP), and report its status. It's a good idea to make sure that this pid still belongs to your process too, and the easiest way is to check /proc/$PID/cmdline.

Addendum: If you're only using newer fedora or ubuntu, your init system is upstart, which has monitoring and triggering capabilities built in.

As @emg-2 noted, BSD process accounting is available, but I don't think it's the correct approach for this situation.

回答2:

This feature is included in Linux Kernel. It's called: BSD process accounting.

回答3:

I would recommend that you write the fact that you started out to some kind of log file, either a private one which get's overwritten on each start up or one via syslogd.

Also, you can log a timestamp heartbeat so that you know exactly when it crashed.

回答4:

you probably can make a decoy, ie an application or shell script that is just a wrapper around the true application, but adds some logging like "Application started". Then you change the name of your original app, and give the original name to your decoy.

回答5:

As JimB mentions, you have the daemon write a PID file. You can tell if it's running or not by sending it a signal 0, via either the kill(2) system call or the kill(1) program. The return status will tell you whether or not the process with that PID exists.

回答6:

Daemons should always: 1) Write the currently running instance's process to /var/run/$NAME.pid using getpid() (man getpid) or an equivalent command for your language. 2) Write a standard logfile to /var/log/$NAME.log (larger logfiles should be broken up into .0.log for currently running logs along with .X.log.gz for other logs, where X is a number with lower being more recent) 3) /Should/ have an LSB compatible run script accepting at least the start stop status and restart flags. Status could be used to check whether the daemon is running.

回答7:

I don't know of a standard way of getting all the process names that have executed; there might be a way however to do this with SystemTap.

If you just want to monitor your process, I would recommend using waitid (man 2 wait) after the fork instead of detaching and daemonizing.

回答8:

If your app has crashed, that's not distinguishable from "your app was never started", unless your app writes in the system log. syslog(3) is your friend.

To find your app you can try a number of ideas:

Look in the /proc filesystem
Run the ps command
Try killall appname -0 and check the return code

来源：https://stackoverflow.com/questions/909798/how-to-know-if-a-process-had-been-started-but-crashed-in-linux

标签

c++

Linux

debugging

monitoring