This generally requires some knowledge of the “geometry” of cache and other aspects of it. It is also helpful to have some control of the system beyond simple user access to it and implementation-dependent things such as finer timing than might be supplied through the standard C clock
mechanism.
Here is an initial approach:
- Write a routine that takes a pointer to memory, a length, and a number of repetitions and reads all of that memory in consecutive order, repeatedly.
- Write a routine that takes a pointer to memory, a length, and a number of repetitions and writes to all of that memory in consecutive order, repeatedly.
- The above routines may have to convert their pointers to
volatile
to prevent the compiler from optimizing away accesses that otherwise have no effect.
- Allocate a large amount of memory.
- Call each of the above routines, getting the current time before and after each call, and calling with a variety of lengths to see the times for different lengths.
When you do this, you will typically see fast speeds (number of bytes read/written per second) for small lengths and slower speeds for longer lengths. The speed decreases will occur where the sizes of the different levels of cache are exceeded. So you are quite likely to see the sizes of L1 and L2 cache reflected in data collected using the above technique.
Here are some reasons that approach is inadequate:
- It does not control the instructions used to read or write cache. The C compiler may well generate load-word and store-word instructions, but many modern processors have instructions that can load and store 16 bytes at a time, and reading and writing may be faster with those instructions than with four-byte word instructions.
- Cache will behave differently when you access in sequentially than if you access it randomly. Most caches make some sort of attempt to track when data is used, so that recently-used data is kept in cache while other data is cast out. The access parts of real programs generally differ from the consecutive operations described above.
- In particular, consecutive writes to memory may be able to fill an entire cache line, so that nothing needs to be read from memory, whereas a real-world usage pattern that writes only one word to a particular location may have to be implemented by reading the cache line from memory and merging in the changed bytes.
- Competition from other processes on your system will interfere with what is in cache and with measurement.