strace | 易学教程

Linux下如何定位高CPU/Memory的代码段

阅读更多关于 Linux下如何定位高CPU/Memory的代码段

先前自开发的过程中解决过一个CPU使用过高的问题，没有记录，今天以FreeSWITCH为例，把解决的过程写下来，方便以后有类似问题参考解决。注：因为不是真正解决问题的过程，所以CPU的数据会和真实的有差距。 1、我们发现正在运行的FreeSWITCH程序占用的CPU过高，需要确定哪段代码引起，首先，找到FS的进程号： 2、通过top -H -p产看进程中所有线程对应的线程：上图所有的线程CPU是一样的，因为没有电话进来，如果有错误，会有对应的线程CPU过高。 3、通过GBD获取对应的堆栈，gdb attach 18258，并通过infor threads命令获得所有的线程如下图：从上图中我们可以看到gdb里面对应的thread号码1-34与系统中线程号18258到18311的对应关系。 4、假如前面通过top -H -p查看到得CPU使用过高的线程为：18312，那么对应的号码为：9.我们可以通过GDB中的 thread 9切换到线程的堆栈中，并通过 bt 命令产看目前的堆栈信息。具体的如下图： 5、通过strace产看程序中，哪些调用占用的时间最长，并对着堆栈和源码找出原因。（注：因为我截图的FS没有问题，所以看到的是正常的值）命令如下：strace -c -f -T -p 18258 效果图如下：因为程序运行是正常的，所以所有的时间都在 select/epoll

多线程进程超时阻塞、卡死问题定位

阅读更多关于多线程进程超时阻塞、卡死问题定位

问题背景：工作中遇到一个多线程进程有部分线程（包含主线程）像卡死一样不再处理其他事件，就像无限期休眠了一样知识点：进程、线程、线程锁、条件变量、socket 定位工具：strace、htop、gdb 问题分析：由于进程还在，用htop查看进程的线程也都在，首先就是想看看每个线程当前处于什么状态用gdb工具查看主线程当前堆栈信息如下可以看到主线程在调用了pthread_join 等待一个线程结束后就阻塞了，具体在等待什么，结合log上下文可以判断出是在等待那个线程（这里根据不同的代码不一样，具体问题具体分析）接着用gdb工具查看主线程等待要结束的线程的堆栈信息如下从改线程堆栈信息看该线程停留在pthread_cond_timewait这里，貌似在等待事件或者超时，但一直没有事件，查看超时时间为500ms，可是为什么一直没超时呢？不知道为什么一直没超时用strace工具跟踪该进程的系统调用和信号传递第17行 5621是主线程，可以看到他在等待对应线程id 509 退出，第1行 509就是等待要退出的线程，从信息可以看出是在等资源0xd32c2c 等待超时是按系统实时时钟，绝对时间来判断的，2147483611就是要等待的秒数，转换出来就是“2038-01-19 11:13:31” 这是要等到“2038-01-19 11:13:31”才会超时退出，太吓人了

Tracing calls to a shared library

阅读更多关于 Tracing calls to a shared library

问题 I am developing a program under Linux. For debugging purposes I want to trace all calls from my program to a certain (preferably shared) library. (I do not want to trace calls happening inside the library.) For syscalls there is strace. Is there any instrument to trace calls to a shared library? 回答1: The tool you are looking for is called ltrace . It allows to trace any call from the program to all (or a set of given) libraries. For example, the following call will list any call to an

poll system call timeout

阅读更多关于 poll system call timeout

问题 Attaching strace shows a lot of these messages: poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 6, 0) = 0 (Timeout) poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 6, 0) = 0 (Timeout) poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events

Capture vDSO in strace

阅读更多关于 Capture vDSO in strace

问题 I was wondering if there is a way to capture (in other words observe) vDSO calls like gettimeofday in strace . Also, is there a way to execute a binary without loading linux-vdso.so.1 (a flag or env variable)? And lastly, what if I write a program that delete the linux-vdso.so.1 address from the auxiliary vector and then execve my program? Has anyone ever tried that? 回答1: You can capture calls to system calls which have been implemented via the vDSO by using ltrace instead of strace . This is

adding “-rpath,/usr/lib” in the build options of a shared library cause a segfault

阅读更多关于 adding “-rpath,/usr/lib” in the build options of a shared library cause a segfault

问题 I have a hello world program. #include <stdio.h> #include <stdlib.h> int main() { printf("hello world! \n"); return 0; } I add -lmicroxml in the build of the program in the linkage phase in order to link to the library libmicroxml.so when I launch my program I get a segmentation fault. the segmentation fault is related to the load of the libmicroxml.so . here after the strace of my helleo world program execution: strace ./test execve("./test", ["./test"], [/* 11 vars */]) = 0 old_mmap(NULL,

How does strace read the file name of system call sys_open?

阅读更多关于 How does strace read the file name of system call sys_open?

问题 I am writing a program which uses Ptrace and does the following: It reads the current eax and checks if the system call is sys_open. If it is then i need to know what are the arguments that are passed. int sys_open(const char * filename, const int mode, const int mask) So eax = 5 implies it is a open system call I came to know ebx has the address of the file location from this Question But how do I knows the length of the file name so I can read the contents in that location? I came across

How to decode this information from strace output

阅读更多关于 How to decode this information from strace output

问题 I wrote a small go script and traced it using strace though this script, I am trying to fetch audit messages from kernel using netlink protocol, just like like auditd. Following is the strace output on my go script- http://paste.ubuntu.com/8272760/ I am trying to find the argument that auditd provide to the sendto function. When I run strace on auditd I get following output sendto(3, "\20\0\0\0\350\3\5\0\1\0\0\0\0\0\0\0", 16, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 16 And

Using strace fixes hung memory issue

阅读更多关于 Using strace fixes hung memory issue

问题 I have a multithreaded process running on RHEL6.x (64bit). I find that the process hangs and some threads (of the same process) crash most of the time when I try to bring up the process. Some threads wait for shared memory between the threads to get created (I can see that all of it does not get created). But when I use strace , the process does not hang and it works just fine (all of the memory that is supposed to be created, gets created). Even interrupting strace after the memory gets

为什么你得学些 TCP 的知识？

阅读更多关于为什么你得学些 TCP 的知识？

这不是指要明白 TCP 的所有东西，也不是说要通读《TCP/IP 详解》。不过懂一点 TCP 知识是很有必要的。理由如下：当我还在 Recurse Center 的时候，我用 Python 写过 TCP 协议栈（还写过一篇文章：如果你用 Python 写 TCP 协议栈会遇到什么？）。这是一次有趣的学习经历，但是也仅此而已。一年以后，工作中有人在 Slack 上提到：“嘿，我在向 NSQ 发布消息时，每次要耗费 40 毫秒”。我已经断断续续思考了一个星期，但是没有任何结果。一点背景知识：NSQ 是一个消息队列，你通过本地的一个 HTTP 请求向其发布消息。发送本地的一个 HTTP 请求确实不应该花费 40 毫秒，有时候会更差。NSQ 守护进程的负载不高，也没有使用过多的内存，也看不到 GC 停顿。这究竟是为什么呢？神呐，救救我吧！突然我记起我一周以前看过的一篇叫做“性能研究（In search of performance）”的文章——我们如何为每个 POST 请求节省 200ms。在这篇文章中，他们说到为什么每个 POST 请求会花费额外的 200 毫秒。就是这个原因。这是该文章中的关键段落：延迟确认（ACK）与 TCP_NODELAY Ruby 的 Net::HTTP 会将 POST 请求切分为两个 TCP 包，一个消息头，一个消息体。相反，curl

订阅 strace