Bit Aligning for Space and Performance Boosts

后端未结

关注

 5  2087

闹比i 2021-02-14 23:03

In the book Game Coding Complete, 3rd Edition, the author mentions a technique to both reduce data structure size and increase access performance. In essence it relies

5条回答

抹茶落季 (楼主)

2021-02-14 23:13

It is highly dependent on the hardware.

Let me demonstrate:

#pragma pack( push, 1 )

struct SlowStruct
{
    char c;
    __int64 a;
    int b;
    char d;
};

struct FastStruct
{
    __int64 a;
    int b;
    char c;
    char d;  
    char unused[ 2 ]; // fill to 8-byte boundary for array use
};

#pragma pack( pop )

int main (void){

    int x = 1000;
    int iterations = 10000000;

    SlowStruct *slow = new SlowStruct[x];
    FastStruct *fast = new FastStruct[x];



    //  Warm the cache.
    memset(slow,0,x * sizeof(SlowStruct));
    clock_t time0 = clock();
    for (int c = 0; c < iterations; c++){
        for (int i = 0; i < x; i++){
            slow[i].a += c;
        }
    }
    clock_t time1 = clock();
    cout << "slow = " << (double)(time1 - time0) / CLOCKS_PER_SEC << endl;
    
    //  Warm the cache.
    memset(fast,0,x * sizeof(FastStruct));
    time1 = clock();
    for (int c = 0; c < iterations; c++){
        for (int i = 0; i < x; i++){
            fast[i].a += c;
        }
    }
    clock_t time2 = clock();
    cout << "fast = " << (double)(time2 - time1) / CLOCKS_PER_SEC << endl;



    //  Print to avoid Dead Code Elimination
    __int64 sum = 0;
    for (int c = 0; c < x; c++){
        sum += slow[c].a;
        sum += fast[c].a;
    }
    cout << "sum = " << sum << endl;


    return 0;
}

Core i7 920 @ 3.5 GHz

slow = 4.578
fast = 4.434
sum = 99999990000000000

Okay, not much difference. But it's still consistent over multiple runs.
So the alignment makes a small difference on Nehalem Core i7.

Intel Xeon X5482 Harpertown @ 3.2 GHz (Core 2 - generation Xeon)

slow = 22.803
fast = 3.669
sum = 99999990000000000

Now take a look...

6.2x faster!!!

Conclusion:

You see the results. You decide whether or not it's worth your time to do these optimizations.

EDIT :

Same benchmarks but without the #pragma pack:

Core i7 920 @ 3.5 GHz

slow = 4.49
fast = 4.442
sum = 99999990000000000

Intel Xeon X5482 Harpertown @ 3.2 GHz

slow = 3.684
fast = 3.717
sum = 99999990000000000

The Core i7 numbers didn't change. Apparently it can handle misalignment without trouble for this benchmark.
The Core 2 Xeon now shows the same times for both versions. This confirms that misalignment is a problem on the Core 2 architecture.

Taken from my comment:

If you leave out the #pragma pack, the compiler will keep everything aligned so you don't see this issue. So this is actually an example of what could happen if you misuse #pragma pack.

0 讨论(0)

查看其它5个回答