Linux kernel live debugging, how it's done and what tools are used?

后端未结

关注

 11  1674

What are the most common and why not uncommon methods and tools used to do live debugging on the Linux kernel? I know that Linus for eg. is against this kind of debugging fo

相关标签:

11条回答

野的像风

2020-11-28 02:24
QEMU + GDB step-by-step procedure tested on Ubuntu 16.10 host

To get started from scratch quickly I've made a minimal fully automated QEMU + Buildroot example at: https://github.com/cirosantilli/linux-kernel-module-cheat Major steps are covered below.

First get a root filesystem rootfs.cpio.gz. If you need one, consider:
- a minimal init-only executable image: https://unix.stackexchange.com/questions/122717/custom-linux-distro-that-runs-just-one-program-nothing-else/238579#238579
- a Busybox interactive system: https://unix.stackexchange.com/questions/2692/what-is-the-smallest-possible-linux-implementation/203902#203902
Then on the Linux kernel:
```
git checkout v4.9
make mrproper
make x86_64_defconfig
cat <<EOF >.config-fragment
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_KERNEL=y
CONFIG_GDB_SCRIPTS=y
EOF
./scripts/kconfig/merge_config.sh .config .config-fragment
make -j"$(nproc)"
qemu-system-x86_64 -kernel arch/x86/boot/bzImage \
                   -initrd rootfs.cpio.gz -S -s
```
On another terminal, supposing you want to start debugging from start_kernel:
```
gdb \
    -ex "add-auto-load-safe-path $(pwd)" \
    -ex "file vmlinux" \
    -ex 'set arch i386:x86-64:intel' \
    -ex 'target remote localhost:1234' \
    -ex 'break start_kernel' \
    -ex 'continue' \
    -ex 'disconnect' \
    -ex 'set arch i386:x86-64' \
    -ex 'target remote localhost:1234'
```
and we are done!!

For kernel modules see: How to debug Linux kernel modules with QEMU?

For Ubuntu 14.04, GDB 7.7.1, hbreak was needed, break software breakpoints were ignored. Not the case anymore in 16.10. See also: https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/901944

The messy disconnect and what come after it are to work around the error:
```
Remote 'g' packet reply is too long: 000000000000000017d11000008ef4810120008000000000fdfb8b07000000000d352828000000004040010000000000903fe081ffffffff883fe081ffffffff00000000000e0000ffffffffffe0ffffffffffff07ffffffffffffffff9fffff17d11000008ef4810000000000800000fffffffff8ffffffffff0000ffffffff2ddbf481ffffffff4600000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f0300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000
```
Related threads:
- https://sourceware.org/bugzilla/show_bug.cgi?id=13984 might be a GDB bug
- Remote 'g' packet reply is too long
- http://wiki.osdev.org/QEMU_and_GDB_in_long_mode osdev.org is as usual an awesome source for these problems
- https://lists.nongnu.org/archive/html/qemu-discuss/2014-10/msg00069.html
See also:
- https://github.com/torvalds/linux/blob/v4.9/Documentation/dev-tools/gdb-kernel-debugging.rst official Linux kernel "documentation"
- How to debug the Linux kernel with GDB and QEMU?
Known limitations:
- the Linux kernel does not support (and does not even compile without patches) with -O0: How to de-optimize the Linux kernel to and compile it with -O0?
- GDB 7.11 will blow your memory on some types of tab completion, even after the max-completions fix: Tab completion interrupt for large binaries Likely some corner case which was not covered in that patch. So an ulimit -Sv 500000 is a wise action before debugging. Blew up specifically when I tab completed file<tab> for the filename argument of sys_execve as in: https://stackoverflow.com/a/42290593/895245
0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2020-11-28 02:26
Another option is to use ICE/JTAG controller, and GDB. This 'hardware' solution is especially used with embedded systems,

but for instance Qemu offers similar features:
- start qemu with a gdb 'remote' stub which listens on 'localhost:1234' : qemu -s ...,
- then with GDB you open the kernel file vmlinux compiled with debug information (you can take a look a this mailing list thread where they discuss the unoptimization of the kernel).
- connect GDB and Qemu: target remote localhost:1234
- see your live kernel:
```
(gdb) where
#0  cpu_v7_do_idle () at arch/arm/mm/proc-v7.S:77
#1  0xc0029728 in arch_idle () atarm/mach-realview/include/mach/system.h:36
#2  default_idle () at arm/kernel/process.c:166
#3  0xc00298a8 in cpu_idle () at arch/arm/kernel/process.c:199
#4  0xc00089c0 in start_kernel () at init/main.c:713
```
unfortunately, user-space debugging is not possible so far with GDB (no task list information, no MMU reprogramming to see different process contexts, ...), but if you stay in kernel-space, that's quite convenient.
- info threads will give you the list and states of the different CPUs
EDIT:

You can get more details about the procedure in this PDF:

Debugging Linux systems using GDB and QEMU.
0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2020-11-28 02:26
While debugging Linux kernel we can utilize several tools, for example, debuggers (KDB, KGDB), dumping while crashed (LKCD), tracing toolkit (LTT, LTTV, LTTng), custom kernel instruments (dprobes, kprobes). In the following section I tried to summarized most of them, hope these will help.

LKCD (Linux Kernel Crash Dump) tool allows the Linux system to write the contents of its memory when a crash occurs. These logs can be further analyzed for the root cause of the crash. Resources regarding LKCD
- http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaax/lkcd.pdf
- https://www.novell.com/coolsolutions/feature/15284.html
- https://www.novell.com/support/kb/doc.php?id=3044267
Oops when kernel detects a problem, it prints an Oops message. Such a message is generated by printk statements in the fault handler (arch/*/kernel/traps.c). A dedicated ring buffer in the kernel being used by the printk statements. Oops contains information like, the CPU where the Oops occurred on, contents of CPU registers, number of Oops, description, stack back trace and others. Resources regarding kernel Oops
- https://www.kernel.org/doc/Documentation/oops-tracing.txt
- http://madwifi-project.org/wiki/DevDocs/KernelOops
- https://wiki.ubuntu.com/DebuggingKernelOops
Dynamic Probes is one of the popular debugging tool for Linux which developed by IBM. This tool allows the placement of a “probe” at almost any place in the system, in both user and kernel space. The probe consists of some code (written in a specialized, stack-oriented language) that is executed when control hits the given point. Resources regarding Dynamic Probe listed below
- http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaax/dprobesltt.pdf
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.6212&rep=rep1&type=pdf
Linux Trace Toolkit is a kernel patch and a set of related utilities that allow the tracing of events in the kernel. The trace includes timing information and can create a reasonably complete picture of what happened over a given period of time. Resources of LTT, LTT Viewer and LTT Next Generation
- http://elinux.org/Linux_Trace_Toolkit
- http://www.linuxjournal.com/article/3829
- http://multivax.blogspot.com/2010/11/introduction-to-linux-tracing-toolkit.html
MEMWATCH is an open source memory error detection tool. It works by defining MEMWATCH in gcc statement and by adding a header file to our code. Through this we can track memory leaks and memory corruptions. Resources regarding MEMWATCH
- http://www.linuxjournal.com/article/6059
ftrace is a good tracing framework for Linux kernel. ftrace traces internal operations of the kernel. This tool included in the Linux kernel in 2.6.27. With its various tracer plugins, ftrace can be targeted at different static tracepoints, such as scheduling events, interrupts, memory-mapped I/O, CPU power state transitions, and operations related to file systems and virtualization. Also, dynamic tracking of kernel function calls is available, optionally restrictable to a subset of functions by using globs, and with the possibility to generate call graphs and provide stack usage. You can find a good tutorial of ftrace at https://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf

ltrace is a debugging utility in Linux, used to display the calls a user space application makes to shared libraries. This tool can be used to trace any dynamic library function call. It intercepts and records the dynamic library calls which are called by the executed process and the signals which are received by that process. It can also intercept and print the system calls executed by the program.
- http://www.ellexus.com/getting-started-with-ltrace-how-does-it-do-that/?doing_wp_cron=1425295977.1327838897705078125000
- http://developerblog.redhat.com/2014/07/10/ltrace-for-rhel-6-and-7/
KDB is the in-kernel debugger of the Linux kernel. KDB follows simplistic shell-style interface. We can use it to inspect memory, registers, process lists, dmesg, and even set breakpoints to stop in a certain location. Through KDB we can set breakpoints and execute some basic kernel run control (Although KDB is not source level debugger). Several handy resources regarding KDB
- http://www.drdobbs.com/open-source/linux-kernel-debugging/184406318
- http://elinux.org/KDB
- http://dev.man-online.org/man1/kdb/
- https://www.kernel.org/pub/linux/kernel/people/jwessel/kdb/usingKDB.html
KGDB is intended to be used as a source level debugger for the Linux kernel. It is used along with gdb to debug a Linux kernel. Two machines are required for using kgdb. One of these machines is a development machine and the other is the target machine. The kernel to be debugged runs on the target machine. The expectation is that gdb can be used to "break in" to the kernel to inspect memory, variables and look through call stack information similar to the way an application developer would use gdb to debug an application. It is possible to place breakpoints in kernel code and perform some limited execution stepping. Several handy resources regarding KGDB
- http://landley.net/kdocs/Documentation/DocBook/xhtml-nochunks/kgdb.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-11-28 02:31
User mode Linux (UML)

https://en.wikipedia.org/wiki/User-mode_Linux

Another virtualization another method that allows step debugging kernel code.

UML is very ingenious: it is implemented as an ARCH, just like x86, but instead of using low level instructions, it implements the ARCH functions with userland system calls.

The result is that you are able to run Linux kernel code as a userland process on a Linux host!

First make a rootfs and run it as shown at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207

The um defconfig sets CONFIG_DEBUG_INFO=y by default (yup, it is a development thing), so we are fine.

On guest:
```
i=0
while true; do echo $i; i=$(($i+1)); done
```
On host in another shell:
```
ps aux | grep ./linux
gdb -pid "$pid"
```
In GDB:
```
break sys_write
continue
continue
```
And now you are controlling the count from GDB, and can see source as expected.

Pros:
- fully contained in the Linux kernel mainline tree
- more lightweight than QEMU's full system emulation
Cons:
- very invasive, as it changes how the kernel itself is compiled.
  
  But the higher level APIs outside of ARCH specifics should remain unchanged.
- arguably not very active: Is user mode linux (UML) project stopped?
See also: https://unix.stackexchange.com/questions/127829/why-would-someone-want-to-run-usermode-linux-uml
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-11-28 02:34
Another good tool for "live" debugging is kprobes / dynamic probes.

This lets you dynamically build little tiny modules which run when certain addresses are executed - sort of like a breakpoint.

The big advantage of them are:
1. They do not impact the system - i.e. when a location is hit - it just excecutes the code - it doesn't halt the whole kernel.
2. You don't need two different systems interconnected (target and debug) like with kgdb
It is best for doing things like hitting a breakpoint, and seeing what data values are, or checking if things have been changed/overwritten, etc. If you want to "step through code" - it doesn't do that.

Addition - 2018:

Another very powerful method is a program simply called "perf" which kind of rolls-up many tools (like Dynamic probes) and kind of replaces/depricates others (like oprofile).

In particular, the perf probe command can be used to easily create/add dynamic probes to the system, afterwhich perf record can sample the system and report info (and backtraces) when the probe is hit for reporting via perf report (or perf script). If you have good debug symbols in the kernel you can get great intel out of the system without even taking the kernel down. Do a man perf (in Google or on your system) for more info on this tool or see this great page on it:

http://www.brendangregg.com/perf.html
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2