C/C++: Optimization of pointers to string constants

后端 未结 6 1714
清歌不尽
清歌不尽 2020-11-27 08:09

Have a look at this code:

#include 
using namespace std;

int main()
{
    const char* str0 = \"Watchmen\";
    const char* str1 = \"Watchmen         


        
相关标签:
6条回答
  • 2020-11-27 08:15

    It can't be relied on, it is an optimization which is not a part of any standard.

    I'd changed corresponding lines of your code to:

    const char* str0 = "Watchmen";
    const char* str1 = "atchmen";
    char* str2 = "tchmen";
    char* str3 = "chmen";
    

    The output for the -O0 optimization level is:

    0x8048830
    0x8048839
    0x8048841
    0x8048848
    

    But for the -O1 it's:

    0x80487c0
    0x80487c1
    0x80487c2
    0x80487c3
    

    As you can see GCC (v4.1.2) reused first string in all subsequent substrings. It's compiler choice how to arrange string constants in memory.

    0 讨论(0)
  • 2020-11-27 08:18

    You shouldn't count on that of course. An optimizer might do something tricky on you, and it should be allowed to do so.

    It is however very common. I remember back in '87 a classmate was using the DEC C compiler and had this weird bug where all his literal 3's got turned into 11's (numbers may have changed to protect the innocent). He even did a printf ("%d\n", 3) and it printed 11.

    He called me over because it was so weird (why does that make people think of me?), and after about 30 minutes of head scratching we found the cause. It was a line roughly like this:

    if (3 = x) break;
    

    Note the single "=" character. Yes, that was a typo. The compiler had a wee bug and allowed this. The effect was to turn all his literal 3's in the entire program into whatever happened to be in x at the time.

    Anyway, its clear the C compiler was putting all literal 3's in the same place. If a C compiler back in the 80's was capable of doing this, it can't be too tough to do. I'd expect it to be very common.

    0 讨论(0)
  • 2020-11-27 08:29

    I would not rely on the behavior, because I am doubtful the C or C++ standards would make explicit this behavior, but it makes sense that the compiler does it. It also makes sense that it exhibits this behavior even in the absence of any optimization specified to the compiler; there is no trade-off in it.

    All string literals in C or C++ (e.g. "string literal") are read-only, and thus constant. When you say:

    char *s = "literal";
    

    You are in a sense downcasting the string to a non-const type. Nevertheless, you can't do away with the read-only attribute of the string: if you try to manipulate it, you'll be caught at run-time rather than at compile-time. (Which is actually a good reason to use const char * when assigning string literals to a variable of yours.)

    0 讨论(0)
  • 2020-11-27 08:33

    No, it can't be relied on, but storing read-only string constants in a pool is a pretty easy and effective optimization. It's just a matter of storing an alphabetical list of strings, and then outputting them into the object file at the end. Think of how many "\n" or "" constants are in an average code base.

    If a compiler wanted to get extra fancy, it could re-use suffixes too: "\n" can be represented by pointing to the last character of "Hello\n". But that likely comes with very little benifit for a significant increase in complexity.

    Anyway, I don't believe the standard says anything about where anything is stored really. This is going to be a very implementation-specific thing. If you put two of those declarations in a separate .cpp file, then things will likely change too (unless your compiler does significant linking work.)

    0 讨论(0)
  • 2020-11-27 08:34

    It's an extremely easy optimization, probably so much so that most compiler writers don't even consider it much of an optimization at all. Setting the optimization flag to the lowest level doesn't mean "Be completely naive," after all.

    Compilers will vary in how aggressive they are at merging duplicate string literals. They might limit themselves to a single subroutine — put those four declarations in different functions instead of a single function, and you might see different results. Others might do an entire compilation unit. Others might rely on the linker to do further merging among multiple compilation units.

    You can't rely on this behavior, unless your particular compiler's documentation says you can. The language itself makes no demands in this regard. I'd be wary about relying on it in my own code, even if portability weren't a concern, because behavior is liable to change even between different versions of a single vendor's compiler.

    0 讨论(0)
  • 2020-11-27 08:34

    You surely should not rely on that behavior, but most compilers will do this. Any literal value ("Hello", 42, etc.) will be stored once, and any pointers to it will naturally resolve to that single reference.

    If you find that you need to rely on that, then be safe and recode as follows:

    char *watchmen = "Watchmen";
    char *foo = watchmen;
    char *bar = watchmen;
    
    0 讨论(0)
提交回复
热议问题