Do Static Variables Impede Data Caching?

问题

From Optimizing Software in C++ (Section 7.1),

The advantage of static data is that it can be initialized to desired values before the program starts. The disadvantage is that the memory space is occupied throughout the whole program execution, even if the variable is only used in a small part of the program. This makes data caching less efficient.

The usage of static in this except is as it applies to both C and C++ in the exact case of static storage duration.

Can anyone shed some light on why (or whether) data caching is less efficient for static duration variables? Here is a specific comparison:

void foo() {
  static int static_arr[] = {/**/};
}
void bar() {
  int local_arr[] = {/**/};
}

I don't see any reason why static data would cache differently than any other kind of data. In the given example, I would think that foo will be faster because the execution stack doesn't have to load static_arr, whereas in bar, the execution stack has to load local_arr. In either case, if these functions were called repeatedly, both static_arr and local_arr will be cached. Am I wrong?

回答1:

The answer from rustyx explains it. Local variables are stored on the stack. The stack space is released when a function returns and reused when the next function is called. Caching is more efficent for local variables because the same memory space is reused again and again, while static variables are scattered around at different memory addresses that can never be reused for another purpose. Whether static data are stored in the DATA section (initialized) or the BSS section (uninitalized) makes no difference in this respect. The top-of-stack space is likely to stay cached throughout program execution and be reused many times.

Another advantage is that a limited number of local variables can be accessed with an 8-bit offset relative to the stack pointer, while static variables need a 32-bit absolute address (in 32-bit x86) or a 32-bit relative address (in x86-64). In other words, local variables may make the code more compact and improve utilization of the code cache as well as the data cache.

// Example
int main () {
  f();
  g();
  return 0;
}

void f() {
   int x; 
   ...
}

void g() {
   int y;  // y may occupy the same memory address as x
   ...
}

回答2:

In general, Agner Fog usually knows what he is talking about.

If we read the quote in the context of section 7.1 Different kinds of variable storage, we see what he means by "less efficient caching" in the beginning of the section:

Data caching is poor if data are scattered randomly around in the memory. It is therefore important to understand how variables are stored. The storage principles are the same for simple variables, arrays and objects.

So the idea behind saying that static variables are less cache-efficient is that the chance that the memory location where they are stored is "cold" (no longer in cache) is greater than with stack memory, which is where the variable with automatic storage duration would be stored.

With caching and paging in mind, it's the combination of physical and temporal locality of data storage that affects performance.

回答3:

The statement does or does not make sense depending on how you punctuate it:

Reading 1:

The advantage of static data is that it can be initialized to desired values before the program starts. The disadvantage is that the memory space is occupied throughout the whole program execution, even if the variable is only used in a small part of the program.

All of the above makes data caching less efficient.

This is nonsense.

Reading 2:

The advantage of static data is that it can be initialized to desired values before the program starts.

The disadvantage is that the memory space is occupied throughout the whole program execution...

...even if the variable is only used in a small part of the program. There is a case where this could make data caching less efficient.

That case would be where the static variable has been allocated storage either in a page that is not always swapped in, or is on a cache line that is rarely otherwise used. You may incur a cache miss, or theoretically in the worst case a page fault (although frankly, with the amount of physical memory at our disposal these days, if this happens you have bigger problems).

In the specific case you demonstrate, the answer would be, "it depends".

Yes, the initialisation of static_arr is a one-time-only operation and so can be thought of as costless.

Yes, the initialisation of local_arr happens each time the function is called, but it might be that:

this initialisation is trivial, or
the initialisation is elided by the compiler as part of an optimiser pass

In general, unless you have a specific optimisation in mind, it is Better(tm) to write the code that explicitly states the behaviour you want, i.e.:

use static variables (variables with static storage duration) when the variable/array's value(s) should survive successive calls to the function.
use local variables (strictly, variables with automatic storage duration) when the existing values are meaningless on entry or exit from the function.

You will find that the compiler will in almost all cases, do the most efficient thing after the optimisation pass(es).

There is a specific case case where static initialisation is Better(tm). In the case of (say) a buffer that requires dynamic allocation. You may not want to incur the cost of allocation/deallocation on every call. You may want the buffer to dynamically grow when needed, and stay grown on the basis that future operations may well need the memory again.

In this case, the actual state of the variable is the size of its allocated buffer. Thus, state is important on the function's entry and exit, eg:

  std::string const& to_string(json const& json_object)
  {
    static thread_local buffer;               // thread-safe, auto-expanding buffer storage
    buffer.clear();                           // does not release memory
    serialise_to_string(buffer, json_object); // may allocate memory occasionally
    return buffer;
  }

来源：https://stackoverflow.com/questions/54743186/do-static-variables-impede-data-caching

标签

c++

performance

caching