CPU cache inhibition

前端未结

关注

 4  636

难免孤独 2021-01-15 05:41

Say I have the defacto standard x86 CPU with 3 level of Caches, L1/L2 private, and L3 shared among cores. Is there a way to allocate shared memory whose data will not be cac

4条回答

走了就别回头了 (楼主)

2021-01-15 06:21

I believe you should not (and probably cannot) care, and hope that the shared memory is in L3. BTW, user-space C code runs in virtual address space and your other cores might (and often do) run some other unrelated process.

The hardware and the MMU (which is configured by the kernel) will ensure that L3 is properly shared.

but I'd like to experiment with performance with and without bringing the shared data into private caches.

As far as I understand (quite poorly) recent Intel hardware, this is not possible (at least not in user-land).

Maybe you might consider the PREFETCH machine instruction and the __builtin_prefetch GCC builtin (which does the opposite of what you want, it brings data to closer caches). See this and that.

BTW, the kernel does preemptive scheduling, so context switches can happen at any moment (often several hundred times each second). When (at context switch time) another process is scheduled on the same core, the MMU needs to be reconfigured (because each process has its own virtual address space, and the caches are "cold" again).

You might be interested in processor affinity. See sched_setaffinity(2). Read about about Real-Time Linux. See sched(7). And see numa(7).

I am not sure at all that the performance hit you are afraid about is noticable (and I believe it is not avoidable in user-space).

Perhaps you might consider moving your sensitive code in kernel space (so with CPL0 privilege) but that probably requires months of work and is probably not worth the effort. I won't even try.

Have you considered other completely different approaches (e.g. rewriting it in OpenCL for your GPGPU) to your latency sensitive code ?

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...