E.g. does the bash debugger support attaching to existing processes and examining the current state?
Or can I easily find out by looking at the bash process entries in /
No real solution. But in most cases a script is waiting for a child process to terminate:
ps --ppid $(pidof yourscript)
You could also setup signal handlers in you shell skript do toggle the printing of commands:
#!/bin/bash
trap "set -x" SIGUSR1
trap "set +x" SIGUSR2
while true; do
sleep 1
done
Then use
kill -USR1 $(pidof yourscript)
kill -USR2 $(pidof yourscript)
I recently found myself in a similar position. I had a shell script that was not identifiable through other means (such as arguments, etc.)
There are ways to find out a lot more about a running process than you would expect.
Use lsof -p $pid
to see what files are open, which may give you some clues. Note that some files, while "deleted", can still be kept open by the script. As long as the script doesn't close the file, it can still read and write from it - and the file still takes up room on the file system.
Use strace
to actively trace the system calls used by the script. The script will read the script file, so you can see some of the commands as they are read prior to execution. Look for read
commands with this command:
strace -p $pid -s 1024
This makes the commands print strings up to 1024 characters long (normally, the strace
command would truncate strings much shorter than that).
Examine the directory /proc/$pid
in order to see details about the script; in particular note, see /proc/$pid/environ
which will give you the process environment separated by nulls. To read this "file" properly, use this command:
xargs -0 -i{} < /proc/$pid/environ
You can pipe that into less
or save it in a file. There is also /proc/$pid/cmdline
but it is possible that that will only give you the shell name (-bash
for instance).
Use pstree
to show what linux command/executable your script is calling. For example, 21156
is the pid of my hanging script:
ocfs2cts1:~ # pstree -pl 21156
activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
├─ssh(15148)
└─{mpirun}(15147)
So that, I know it's hanging at chmod
command. Then, show the stack trace by:
ocfs2cts1:~ # cat /proc/15232/stack
[<ffffffffa05377ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
[<ffffffffa053856d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
[<ffffffffa0538dbb>] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
[<ffffffffa0531e61>] ocfs2_readdir+0x41/0x1b0 [ocfs2]
[<ffffffff8120d03c>] iterate_dir+0x9c/0x110
[<ffffffff8120d453>] SyS_getdents+0x83/0xf0
[<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
[<ffffffffffffffff>] 0xffffffffffffffff
Oh, boy, it's likely a deadlock bug...