rdtsc

Where is the code for RDTSC handler in QEMU source code?

青春壹個敷衍的年華 提交于 2019-12-13 17:32:01
问题 I am working on an application which requires me to make some changes with the part of the QEMU source code which deals with RDTSC calls. However, I am not able to locate the same in the huge source code. 回答1: Key portion is here: target-i386/translate.c 6850 case 0x131: /* rdtsc */ 6851 if (s->cc_op != CC_OP_DYNAMIC) 6852 gen_op_set_cc_op(s->cc_op); 6853 gen_jmp_im(pc_start - s->cs_base); 6854 if (use_icount) 6855 gen_io_start(); 6856 gen_helper_rdtsc(); 6857 if (use_icount) { 6858 gen_io

Perf Monitoring for rdtsc dynamically

烈酒焚心 提交于 2019-12-13 03:27:13
问题 Is there a way to monitor for assembly instructions in "real-time" dynamically using perf? I have seen that if I use perf record /perf top and then click on the recorded functions, I see the assembly instructions, but can I directly monitor specific assembly instructions e.g., rdtsc or clflush e.g., how often they are called by a process within certain period using perf? I am using Debian 9 on Skylake and also on Haswell. sudo uname -a Linux bla 4.9.0-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27

What is the gcc cpu-type that includes support for RDTSCP?

ε祈祈猫儿з 提交于 2019-12-10 20:17:22
问题 I am using RDTSCP to replace LFENCE;RDTSC sequences and also get the processor ID back so that I know when I'm comparing TSC values after the thread was rescheduled to another CPU. To ensure I don't run RDTSCP on a too old machine I fallback to RDTSC after a CPUID check (using libcpuid). I'd like to try using the gcc multiple target attribute functionality instead of a CPUID call: int core2_func (void) __attribute__ ((__target__ ("arch=core2"))); The gcc manual lists a number of cpu families

There is any way to trigger a legacy mode for RDTSC?

Deadly 提交于 2019-12-10 19:52:19
问题 I rewrote the entire question, people clearly weren't understanding it. RDTSC used to count CPU cycles, and it varied with the CPU throttling. Currently, RDTSC don't vary with CPU throttling. Some old applications, expect RDTSC to vary with CPU throttling. How I make RDTSC give them what they expect? I don't want to profile code, I don't want to rewrite massive amounts of code, I don't want to oblige users to mess with the BIOS or Kernel permissions, I just want to make legacy apps work as

Is there a cheaper serializing instruction than cpuid?

本小妞迷上赌 提交于 2019-12-06 13:40:55
问题 I have seen the related question including here and here, but it seems that the only instruction ever mentioned for serializing rdtsc is cpuid . Unfortunately, cpuid takes roughly 1000 cycles on my system, so I am wondering if anyone knows of a cheaper (fewer cycles and no read or write to memory) serializing instruction? I looked at iret , but that seems to change control flow, which is also undesirable. I have actually looked at the whitespaper linked in Alex's answer about rstscp , but it

Assembler instruction: rdtsc

自古美人都是妖i 提交于 2019-12-06 04:30:38
问题 Could someone help me understand the assembler given in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html It goes like this: uint64_t msr; asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. "shl $32, %%rdx\n\t" // Shift the upper bits left. "or %%rdx, %0" // 'Or' in the lower bits. : "=a" (msr) : : "rdx"); How is it different from: uint64_t msr; asm volatile ( "rdtsc\n\t" : "=a" (msr)); Why do we need shift and or operations and what does rdx at the end do? EDIT: added what is

Is there a cheaper serializing instruction than cpuid?

十年热恋 提交于 2019-12-04 15:35:37
I have seen the related question including here and here , but it seems that the only instruction ever mentioned for serializing rdtsc is cpuid . Unfortunately, cpuid takes roughly 1000 cycles on my system, so I am wondering if anyone knows of a cheaper (fewer cycles and no read or write to memory) serializing instruction? I looked at iret , but that seems to change control flow, which is also undesirable. I have actually looked at the whitespaper linked in Alex's answer about rstscp , but it says: The RDTSCP instruction waits until all previous instructions have been executed before reading

How to ensure that RDTSC is accurate?

╄→尐↘猪︶ㄣ 提交于 2019-12-04 14:31:47
I've read that RDTSC can gives false readings and should not be relied upon. Is this true and if so what can be done about it? Very old CPU's have a RDTSC that is accurate. The problem However newer CPU's have a problem. Engineers decided that RDTSC would be great for telling time. However if a CPU throttles the frequency RDTSC is useless for telling time. The aforementioned braindead engineers then decided to 'fix' this problem by having the TSC always run at the same frequency, even if the CPU slows down. This has the 'advantage' that TSC can be used for telling elapsed (wall clock) time.

Assembler instruction: rdtsc

六眼飞鱼酱① 提交于 2019-12-04 09:49:57
Could someone help me understand the assembler given in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html It goes like this: uint64_t msr; asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. "shl $32, %%rdx\n\t" // Shift the upper bits left. "or %%rdx, %0" // 'Or' in the lower bits. : "=a" (msr) : : "rdx"); How is it different from: uint64_t msr; asm volatile ( "rdtsc\n\t" : "=a" (msr)); Why do we need shift and or operations and what does rdx at the end do? EDIT: added what is still unclear to the original question. What does "\n\t" do? What do ":" do? delimiters output/input

rdtsc timing for a measuring a function

不问归期 提交于 2019-12-04 06:54:25
问题 I want to time a function call with rdtsc. So I measured it in two ways as follows. Call it in a loop. Aggregate each rdtsc difference within the loop and divide by number of calls. (Let's say this is N) Call it in a loop. Get the rdtsc difference of the loop itself and divide by N. But I see couple of inconsistent behaviors. When I increase N the times get reduced rather monotonically in both method 1 and 2. For method 2 it is understandable in that it would amortize the loop control