Large 2D array gives segmentation fault

后端 未结 7 1567
一生所求
一生所求 2020-12-02 08:08

I am writing some C++ code in Linux where I have declared a few 2D arrays like so:

 double x[5000][500], y[5000][500], z[5000][500];

During

相关标签:
7条回答
  • 2020-12-02 08:29

    These arrays are on the stack. Stacks are quite limited in size. You probably run into a ... stack overflow :)

    If you want to avoid this, you need to put them on the free store:

    double* x =new double[5000*5000];
    

    But you better start the good habit of using the standard containers, which wrap all this for you:

    std::vector< std::vector<int> > x( std::vector<int>(500), 5000 );
    

    Plus: even if the stack fits the arrays, you still need room for functions to put their frames on it.

    0 讨论(0)
  • 2020-12-02 08:35

    Your declaration should appear at top level, outside any procedure or method.

    By far the easiest way to diagnose a segfault in C or C++ code is to use valgrind. If one of your arrays is at fault, valgrind will pinpoint exactly where and how. If the fault lies elsewhere, it will tell you that, too.

    valgrind can be used on any x86 binary but will give more information if you compile with gcc -g.

    0 讨论(0)
  • 2020-12-02 08:39

    One reservation about always using vector: as far as I understand it, if you walk off the end of the array it just allocates a larger array and copies everything over which might create subtle and hard to find errors when you are really tying to work with a fixed size array. At least with a real array you'll segfault if you walk off the end making the error easier to catch.

    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char **argv) {
    
    typedef double (*array5k_t)[5000];
    
    array5k_t array5k = calloc(5000, sizeof(double)*5000);
    
    // should generate segfault error
    array5k[5000][5001] = 10;
    
    return 0;
    }
    
    0 讨论(0)
  • 2020-12-02 08:40

    If your program looks like this ...

    int main(int, char **) {
       double x[5000][500],y[5000][500],z[5000][500];
       // ...
       return 0;
    }
    

    ... then you are overflowing the stack. The fastest way to fix this is to add the word static.

    int main(int, char **) {
       static double x[5000][500],y[5000][500],z[5000][500];
       // ...
       return 0;
    }
    

    The second fastest way to fix this is to move the declaration out of the function:

    double x[5000][500],y[5000][500],z[5000][500];
    int main(int, char **) {
       // ...
       return 0;
    }
    

    The third fastest way to fix this is to allocate the memory on the heap:

    int main(int, char **) {
       double **x = new double*[5000];
       double **y = new double*[5000];
       double **z = new double*[5000];
       for (size_t i = 0; i < 5000; i++) {
          x[i] = new double[500];
          y[i] = new double[500];
          z[i] = new double[500];
       }
       // ...
       for (size_t i = 5000; i > 0; ) {
          delete[] z[--i];
          delete[] y[i];
          delete[] x[i];
       }
       delete[] z;
       delete[] y;
       delete[] x;
    
       return 0;
    }
    

    The fourth fastest way is to allocate them on the heap using std::vector. It is fewer lines in your file but more lines in the compilation unit, and you must either think of a meaningful name for your derived vector types or tuck them into an anonymous namespace so they won't pollute the global namespace:

    #include <vector>
    using std::vector
    namespace { 
      struct Y : public vector<double> { Y() : vector<double>(500) {} };
      struct XY : public vector<Y> { XY() : vector<Y>(5000) {} } ;
    }
    int main(int, char **) {
      XY x, y, z;
      // ...
      return 0;
    }
    

    The fifth fastest way is to allocate them on the heap, but use templates so the dimensions are not so remote from the objects:

    include <vector>
    using namespace std;
    namespace {
      template <size_t N>
      struct Y : public vector<double> { Y() : vector<double>(N) {} };
      template <size_t N1, size_t N2>
      struct XY : public vector< Y<N2> > { XY() : vector< Y<N2> > (N1) {} } ;
    }
    int main(int, char **) {
      XY<5000,500> x, y, z;
      XY<500,50> mini_x, mini_y, mini_z;
      // ...
      return 0;
    }
    

    The most performant way is to allocate the two-dimensional arrays as one-dimensional arrays, and then use index arithmetic.

    All the above assumes that you have some reason, a good one or a poor one, for wanting to craft your own multidimensional array mechanism. If you have no reason, and expect to use multidimensional arrays again, strongly consider installing a library:

    • A plays-nicely-with-STL way is to use the Boost Multidimensional Array.

    • A speed way is to use Blitz++.

    0 讨论(0)
  • 2020-12-02 08:47

    Another solution to the previous ones would be to execute a

    ulimit -s stack_area
    

    to expand the maximum stack.

    0 讨论(0)
  • 2020-12-02 08:48

    Looks to me like you have an honest-to-Spolsky stack overflow!

    Try compiling your program with gcc's -fstack-check option. If your arrays are too big to allocate on the stack, you'll get a StorageError exception.

    I think it's a good bet, though, as 5000*500*3 doubles (8 bytes each) comes to around 60 megs - no platform has enough stack for that. You'll have to allocate your big arrays on the heap.

    0 讨论(0)
提交回复
热议问题