Are there any downsides to passing structs by value in C, rather than passing a pointer?

落花浮王杯 提交于 2019-11-26 23:24:37

For small structs (eg point, rect) passing by value is perfectly acceptable. But, apart from speed, there is one other reason why you should be careful passing/returning large structs by value: Stack space.

A lot of C programming is for embedded systems, where memory is at a premium, and stack sizes may be measured in KB or even Bytes... If you're passing or returning structs by value, copies of those structs will get placed on the stack, potentially causing the situation that this site is named after...

If I see an application that seems to have excessive stack usage, structs passed by value is one of the things I look for first.

One reason not to do this which has not been mentioned is that this can cause an issue where binary compatibility matters.

Depending on the compiler used, structures can be passed via the stack or registers depending on compiler options/implementation

See: http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

-fpcc-struct-return

-freg-struct-return

If two compilers disagree, things can blow up. Needless to say the main reasons not to do this are illustrated are stack consumption and performance reasons.

kizzx2

To really answer this question, one needs to dig deep into the assembly land:

(The following example uses gcc on x86_64. Anyone is welcome to add other architectures like MSVC, ARM, etc.)

Let's have our example program:

// foo.c

typedef struct
{
    double x, y;
} point;

void give_two_doubles(double * x, double * y)
{
    *x = 1.0;
    *y = 2.0;
}

point give_point()
{
    point a = {1.0, 2.0};
    return a;
}

int main()
{
    return 0;
}

Compile it with full optimizations

gcc -Wall -O3 foo.c -o foo

Look at the assembly:

objdump -d foo | vim -

This is what we get:

0000000000400480 <give_two_doubles>:
    400480: 48 ba 00 00 00 00 00    mov    $0x3ff0000000000000,%rdx
    400487: 00 f0 3f 
    40048a: 48 b8 00 00 00 00 00    mov    $0x4000000000000000,%rax
    400491: 00 00 40 
    400494: 48 89 17                mov    %rdx,(%rdi)
    400497: 48 89 06                mov    %rax,(%rsi)
    40049a: c3                      retq   
    40049b: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

00000000004004a0 <give_point>:
    4004a0: 66 0f 28 05 28 01 00    movapd 0x128(%rip),%xmm0
    4004a7: 00 
    4004a8: 66 0f 29 44 24 e8       movapd %xmm0,-0x18(%rsp)
    4004ae: f2 0f 10 05 12 01 00    movsd  0x112(%rip),%xmm0
    4004b5: 00 
    4004b6: f2 0f 10 4c 24 f0       movsd  -0x10(%rsp),%xmm1
    4004bc: c3                      retq   
    4004bd: 0f 1f 00                nopl   (%rax)

Excluding the nopl pads, give_two_doubles() has 27 bytes while give_point() has 29 bytes. On the other hand, give_point() yields one fewer instruction than give_two_doubles()

What's interesting is that we notice the compiler has been able to optimize mov into the faster SSE2 variants movapd and movsd. Furthermore, give_two_doubles() actually moves data in and out from memory, which makes things slow.

Apparently much of this may not be applicable in embedded environments (which is where the playing field for C is most of the time nowdays). I'm not an assembly wizard so any comments would be welcome!

Ilya

Simple solution will be return an error code as a return value and everything else as a parameter in the function,
This parameter can be a struct of course but don't see any particular advantage passing this by value, just sent a pointer.
Passing structure by value is dangerous, you need to be very careful what are you passing are, remember there is no copy constructor in C, if one of structure parameters is a pointer the pointer value will be copied it might be very confusing and hard to maintain.

Just to complete the answer (full credit to Roddy ) the stack usage is another reason not pass structure by value, believe me debugging stack overflow is real PITA.

Replay to comment:

Passing struct by pointer meaning that some entity has an ownership on this object and have a full knowledge of what and when should be released. Passing struct by value create a hidden references to the internal data of struct (pointers to another structures etc .. ) at this is hard to maintain (possible but why ?) .

I think that your question has summed things up pretty well.

One other advantage of passing structs by value is that memory ownership is explicit. There is no wondering about if the struct is from the heap, and who has the responsibility for freeing it.

I'd say passing (not-too-large) structs by value, both as parameters and as return values, is a perfectly legitimate technique. One has to take care, of course, that the struct is either a POD type, or the copy semantics are well-specified.

Update: Sorry, I had my C++ thinking cap on. I recall a time when it was not legal in C to return a struct from a function, but this has probably changed since then. I would still say it's valid as long as all the compilers you expect to use support the practice.

Mecki

One thing people here have forgotten to mention so far (or I overlooked it) is that structs usually have a padding!

struct {
  short a;
  char b;
  short c;
  char d;
}

Every char is 1 byte, every short is 2 bytes. How large is the struct? Nope, it's not 6 bytes. At least not on any more commonly used systems. On most systems it will be 8. The problem is, the alignment is not constant, it's system dependent, so the same struct will have different alignment and different sizes on different systems.

Not only that padding will further eat up your stack, it also adds the uncertainty of not being able to predict the padding in advance, unless you know how your system pads and then look at every single struct you have in your app and calculate the size for it. Passing a pointer takes a predictable amount of space -- there is no uncertainty. The size of a pointer is known for the system, it is always equal, regardless of what the struct looks like and pointer sizes are always chosen in a way that they are aligned and need no padding.

Here's something no one mentioned:

void examine_data(const char *c, size_t l)
{
    c[0] = 'l'; // compiler error
}

void examine_data(const struct blob blob)
{
    blob.ptr[0] = 'l'; // perfectly legal, quite likely to blow up at runtime
}

Members of a const struct are const, but if that member is a pointer (like char *), it becomes char *const rather than the const char * we really want. Of course, we could assume that the const is documentation of intent, and that anyone who violates this is writing bad code (which they are), but that's not good enough for some (especially those who just spent four hours tracking down the cause of a crash).

The alternative might be to make a struct const_blob { const char *c; size_t l } and use that, but that's rather messy - it gets into the same naming-scheme problem I have with typedefing pointers. Thus, most people stick to just having two parameters (or, more likely for this case, using a string library).

Jingguo Yao

Page 150 of PC Assembly Tutorial on http://www.drpaulcarter.com/pcasm/ has a clear explanation about how C allows a function to return a struct:

C also allows a structure type to be used as the return value of a func- tion. Obviously a structure can not be returned in the EAX register. Different compilers handle this situation differently. A common solution that compilers use is to internally rewrite the function as one that takes a structure pointer as a parameter. The pointer is used to put the return value into a structure defined outside of the routine called.

I use the following C code to verify the above statement:

struct person {
    int no;
    int age;
};

struct person create() {
    struct person jingguo = { .no = 1, .age = 2};
    return jingguo;
}

int main(int argc, const char *argv[]) {
    struct person result;
    result = create();
    return 0;
}

Use "gcc -S" to generate assembly for this piece of C code:

    .file   "foo.c"
    .text
.globl create
    .type   create, @function
create:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $16, %esp
    movl    8(%ebp), %ecx
    movl    $1, -8(%ebp)
    movl    $2, -4(%ebp)
    movl    -8(%ebp), %eax
    movl    -4(%ebp), %edx
    movl    %eax, (%ecx)
    movl    %edx, 4(%ecx)
    movl    %ecx, %eax
    leave
    ret $4
    .size   create, .-create
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $20, %esp
    leal    -8(%ebp), %eax
    movl    %eax, (%esp)
    call    create
    subl    $4, %esp
    movl    $0, %eax
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
    .section    .note.GNU-stack,"",@progbits

The stack before call create:

        +---------------------------+
ebp     | saved ebp                 |
        +---------------------------+
ebp-4   | age part of struct person | 
        +---------------------------+
ebp-8   | no part of struct person  |
        +---------------------------+        
ebp-12  |                           |
        +---------------------------+
ebp-16  |                           |
        +---------------------------+
ebp-20  | ebp-8 (address)           |
        +---------------------------+

The stack right after calling create:

        +---------------------------+
        | ebp-8 (address)           |
        +---------------------------+
        | return address            |
        +---------------------------+
ebp,esp | saved ebp                 |
        +---------------------------+

I just want to point one advantage of passing your structs by value is that an optimizing compiler may better optimize your code.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!