Try to read repetitly large data via CPU (i.e. not by DMA).
Like:
int main() {
const int size = 20*1024*1024; // Allocate 20M. Set much larger then L2
char *c = (char *)malloc(size);
for (int i = 0; i < 0xffff; i++)
for (int j = 0; j < size; j++)
c[j] = i*j;
}
However depend on server a bigger problem may be a disk cache (in memory) then L1/L2 cache. On Linux (for example) drop using:
sync
echo 3 > /proc/sys/vm/drop_caches
Edit: It is trivial to generate large program which do nothing:
#!/usr/bin/ruby
puts "main:"
200000.times { puts " nop" }
puts " xor rax, rax"
puts " ret"
Running a few times under different names (code produced not the script) should do the work