Why does malloc() fail when there is enough memory?

后端 未结 5 1489
遇见更好的自我
遇见更好的自我 2021-01-08 00:21

I\'m using a server with 128GB memory to do some computation. I need to malloc() a 2D float array of size 56120 * 56120. An example code is as follows:

相关标签:
5条回答
  • 2021-01-08 00:51

    As other have pointed out, 56120*56120 overflows int math on OP's platform. That is undefined behavior (UB).

    malloc(size_t x) takes a size_t argument and the values passed to it is best calculated using at least size_t math. By reversing the multiplication order, this is accomplished. sizeof(float) * num cause num to be widened to at least size_t before the multiplication.

    int num = 56120,i,j;
    // ls = (float *)malloc((num * num)*sizeof(float));
    ls = (float *) malloc(sizeof(float) * num * num);
    

    Even though this prevents UB, This does not prevent overflow as mathematically sizeof(float)*56120*56120 may still exceed SIZE_MAX.

    Code could detect potential overflow beforehand.

    if (num < 0 || SIZE_MAX/sizeof(float)/num < num) Handle_Error();
    

    No need to cast the result of malloc().
    Using the size of the referenced variable is easier to code and maintain than sizing to the type.
    When num == 0, malloc(0) == NULL is not necessarily an out-of-memory.
    All together:

    int num = 56120;
    if (num < 0 || ((num > 0) && SIZE_MAX/(sizeof *ls)/num < num)) {
      Handle_Error();
    }
    ls = malloc(sizeof *ls * num * num);
    if (ls == NULL && num != 0) {
      Handle_OOM();
    }
    
    0 讨论(0)
  • 2021-01-08 00:53

    The problem is, that your calculation

    (num * num) * sizeof(float)
    

    is done as 32-bit signed integer calculation and the result for num=56120 is

    -4582051584
    

    Which is then interpreted for size_t with a very huge value

    18446744069127500032
    

    You do not have so much memory ;) This is the reason why malloc() fails.

    Cast num to size_t in the calculation of malloc, then it should work as expected.

    0 讨论(0)
  • 2021-01-08 00:56

    float ls[3149454400]; is an array with automatic storage type, which is usually allocated on the process stack. A process stack is limited by default by a value much smaller than 12GB you are attempting to push there. So the segmentation fault you are observing is caused by the stack overflow, rather than by the malloc.

    0 讨论(0)
  • 2021-01-08 00:59
    int num = 56120,i,j;
    ls = (float *)malloc((num * num)*sizeof(float));
    

    num * num is 56120*56120 which is 3149454400 which overflows a signed int which causes undefined behavoir.

    The reason 40000 works is that 40000*40000 is representable as an int.

    Change the type of num to long long (or even unsigned int)

    0 讨论(0)
  • 2021-01-08 01:12

    This is in contrast to what others have written, but for me, changing the variable num to size_t from int allows allocation. It could be that num*num overflows the int for malloc. Doing malloc with 56120 * 56120 instead of num*num should throw an overflow error.

    0 讨论(0)
提交回复
热议问题