Row-major vs Column-major confusion

后端未结

关注

 9  1406

离开以前

I\'ve been reading a lot about this, the more I read the more confused I get.

My understanding: In row-major rows are stored contiguously in memory, in column-major colu

相关标签:

9条回答

太阳男子

2021-01-31 19:41
Let's look at algebra first; algebra doesn't even have a notion of "memory layout" and stuff.

From an algebraic pov, a MxN real matrix can act on a |R^N vector on its right side and yield a |R^M vector.

Thus, if you were sitting in an exam and given a MxN Matrix and a |R^N vector, you could with trivial operations multiply them and get a result - whether that result is right or wrong will not depend on whether the software your professor uses to check your results internally uses column-major or a row-major layout; it will only depend on if you calculated the contraction of each row of the matrix with the (single) column of the vector properly.

To produce a correct output, the software will - by whatever means - essentially have to contract each row of the Matrix with the column vector, just like you did in the exam.

Thus, the difference between software that aligns column-major and software that uses row-major-layout is not what it calculates, but just how.

To put it more pecisely, the difference between those layouts with regard to the topcial single row's contraction with the column vector is just the means to determine
```
Where is the next element of the current row?
```
- For a row-major-layout it's the element just in the next bucket in memory
- For a column-major-layout it's the element in the bucket M buckets away.
And thats it.

To show you how that column/row magic is summoned in practice:

You haven't tagged your question with "c++", but because you mentioned 'glm', I assume that you can get along with C++.

In C++'s standard library there's an infamous beast called valarray, which, besides other tricky features, has overloads of operator[], one of them can take a std::slice ( which is essentially a very boring thing, consisting of just three integer-type numbers).

This little slice thing however, has everything one would need to access a row-major-storage column-wise or a column-major-storage row-wise - it has a start, a length, and a stride - the latter represents the "distance to next bucket" I mentioned.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-01-31 19:43
Doesn't matter what you use: just be consistent!

Row major or column major is just a convention. Doesn't matter. C uses row major, Fortran uses column. Both work. Use what's standard in your programming language/environment.

Mismatching the two will !@#$ stuff up

If you use row major addressing on a matrix stored in colum major, you can get the wrong element, read past end of the array, etc...
```
Row major: A(i,j) element is at A[j + i * n_columns];  <---- mixing these up will
Col major: A(i,j) element is at A[i + j * n_rows];     <---- make your code fubar
```
It's incorrect to say code to do matrix multiplication is the same for row major and column major

(Of course the math of matrix multiplication is the same.) Imagine you have two arrays in memory:
```
X = [x1, x2, x3, x4]    Y = [y1, y2, y3, y4]
```
If matrices are stored in column major then X, Y, and X*Y are:
```
IF COL MAJOR: [x1, x3  *  [y1, y3    =   [x1y1+x3y2, x1y3+x3y4
               x2, x4]     y2, y4]        x2y1+x4y2, x2y3+x4y4]
```
If matrices are stored in row major then X, Y, and X*Y are:
```
IF ROW MAJOR:  [x1, x2    [y1, y2     = [x1y1+x2y3, x1y2+x2y4;
                x3, x4]    y3, y4]       x3y1+x4y3, x3y2+x4y4];

X*Y in memory if COL major   [x1y1+x3y2, x2y1+x4y2, x1y3+x3y4, x2y3+x4y4]
              if ROW major   [x1y1+x2y3, x1y2+x2y4, x3y1+x4y3, x3y2+x4y4]
```
There's nothing deep going on here. It's just two different conventions. It's like measuring in miles or kilometers. Either works, you just can't flip back and forth between the two without converting!
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2021-01-31 19:44
I think you are mix up an implementation detail with usage, if you will.

Lets start with a two-dimensional array, or matrix:
```
    | 1  2  3 |
    | 4  5  6 |
    | 7  8  9 |
```
The problem is that computer memory is a one-dimensional array of bytes. To make our discussion easier, lets group the single bytes into groups of four, thus we have something looking like this, (each single, +-+ represents a byte, four bytes represents an integer value (assuming 32-bit operating systems) :
```
   -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    |       |       |       |       |       |       |       |       |  
   -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
       \/                   \       /
      one byte               one integer

    low memory    ------>                          high memory
```
Another way of representing

So, the question is how to map a two dimensional structure (our matrix) onto this one dimensional structure (i.e. memory). There are two ways of doing this.
1. Row-major order: In this order we put the first row in memory first, and then the second, and so on. Doing this, we would have in memory the following:
```
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
With this method, we can find a given element of our array by performing the following arithmetic. Suppose we want to access the $M_{ij}$ element of the array. If we assume that we have a pointer to the first element of the array, say ptr, and know the number of columns say nCol, we can find any element by:
```
     $M_{ij} = i*nCol + j$ 
```
To see how this works, consider M_{02} (i.e. first row, third column -- remember C is zero based.
```
      $M_{02} = 0*3 + 2 = 2
```
So we access the third element of the array.
1. Column-major ordering: In this order we put the first column in memory first, and then the second, and so or. Doing this we would have in memory the following:
```
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   1   |   4   |   7   |   2   |   5   |   8   |   3   |   6   |   9   |
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```
SO, the short answer - row-major and column-major format describe how the two (or higher) dimensional arrays are mapped into a one dimensional array of memory.

Hope this helps. T.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-01-31 19:45
Ok, so given that the word "confusion" is literally in the title I can understand the level of...confusion.

Firstly, this absolutely is a real problem

Never, EVER succumb to the idea that "it is used be but...PC's nowadays..."

Of the primary issues here are: -Cache eviction strategy (LRU, FIFO, etc.) as @Y.C.Jung was beginning to touch on -Branch prediction -Pipelining (it's depth, etc) -Actual physical memory layout -Size of memory -Architecture of machine, (ARM, MIPS, Intel, AMD, Motorola, etc.)

This answer will focus on the Harvard architecture, Von Neumann machine as it is most applicable to the current PC.

The memory hierarchy:

https://en.wikipedia.org/wiki/File:ComputerMemoryHierarchy.svgis

Is a juxtaposition of cost versus speed.

For today's standard PC system this would be something like: SIZE: 500GB HDD > 8GB RAM > L2 Cache > L1 Cache > Registers. SPEED: 500GB HDD < 8GB RAM < L2 Cache < L1 Cache < Registers.

This leads to the idea of Temporal and Spatial locality. One means how your data is organized, (code, working set, etc.), the other means physically where your data is organized in "memory."

Given that "most" of today's PC's are little-endian (Intel) machines as of late, they lay data into memory in a specific little-endian ordering. It does differ from big-endian, fundamentally.

https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html (covers it rather... swiftly ;) )

(For the simplicity of this example, I am going to 'say' that things happen in single entries, this is incorrect, entire cache blocks are typically accessed and vary drastically my manufacturer, much less model).

So, now that we have that our of the way, if, hypothetically your program demanded 1GB of data from your 500GB HDD, loaded into your 8GB of RAM, then into the cache hierarchy, then eventually registers, where your program went and read the first entry from your freshest cache line just to have your second (in YOUR code) desired entry happen to be sitting in the next cache line, (i.e. the next ROW instead of column you would have a cache MISS.

Assuming the cache is full, because it is small, upon a miss, according to the eviction scheme, a line would be evicted to make room for the line that 'does' have the next data you need. If this pattern repeated you would have a MISS on EVERY attempted data retrieval!

Worse, you would be evicting lines that actually have valid data you are about to need, so you will have to retrieve them AGAIN and AGAIN.

The term for this is called: thrashing

https://en.wikipedia.org/wiki/Thrashing_(computer_science) and can indeed crash a poorly written/error prone system. (Think windows BSOD)....

On the other hand, if you had laid out the data properly, (i.e. Row major)...you WOULD still have misses!

But these misses would only occur at the end of each retrieval, not on EVERY attempted retrieval. This results in orders of magnitude of difference in system and program performance.

Very very simple snippet:
```
#include<stdio.h>

#define NUM_ROWS 1024
#define NUM_COLS 1024

int COL_MAJOR [NUM_ROWS][NUM_COLS];

int main (void){
        int i=0, j=0;
        for(i; i<NUM_ROWS; i++){
                for(j; j<NUM_COLS; j++){
                        COL_MAJOR[j][i]=(i+j);//NOTE i,j order here!
                }//end inner for
        }//end outer for
return 0;
}//end main
```
Now, compile with: gcc -g col_maj.c -o col.o

Now, run with: time ./col.o real 0m0.009s user 0m0.003s sys 0m0.004s

Now repeat for ROW major:
```
#include<stdio.h>

#define NUM_ROWS 1024
#define NUM_COLS 1024

int ROW_MAJOR [NUM_ROWS][NUM_COLS];

int main (void){
        int i=0, j=0;
        for(i; i<NUM_ROWS; i++){
                for(j; j<NUM_COLS; j++){
                        ROW_MAJOR[i][j]=(i+j);//NOTE i,j order here!
                }//end inner for
        }//end outer for
return 0;
}//end main
```
Compile: terminal4$ gcc -g row_maj.c -o row.o Run: time ./row.o real 0m0.005s user 0m0.001s sys 0m0.003s

Now, as you can see, the Row Major one was significantly faster.

Not convinced? If you would like to see a more drastic example: Make the matrix 1000000 x 1000000, initialize it, transpose it and print it to stdout. ```

(Note, on a *NIX system you WILL need to set ulimit unlimited)

ISSUES with my answer: -Optimizing compilers, they change a LOT of things! -Type of system -Please point any others out -This system has an Intel i5 processor
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2021-01-31 19:54

Today there is no reason to use other then column-major order, there are several libraries that support it in c/c++ (eigen,armadillo,...). Furthermore column-major order is more natural, eg. pictures with [x,y,z] are stored slice by slice in file, this is column-major order. While in two dimension it may be confusing to choose better order, in higher dimension it is quite clear that column-major order is the only solution in many situation.

Authors of C created concept of arrays but perhaps they did not expect that somebody had used it as a matrices. I would be shocked myself if I saw how arrays are used in place where already everything was made up in fortran and column-major order. I think that row-major order is simply alternative to column-major order but only in situation where it is really needed (for now I don't know about any).

It is strangely that still someone creates library with row-major order. It is unnecessary waste of energy and time. I hope that one day everything will be column-major order and all confusions simply disappear.

0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2021-01-31 19:57

You are right. it doesn't matter if a system stored the data in a row-major structure or a column-major one. It is just like a protocol. Computer : "Hey, human. I'm going to store your array this way. No prob. Huh?" However, when it comes to performance, it matters. consider the following three things.

1. most arrays are accessed in row-major order.

2. When you access memory, it is not directly read from memory. You first store some blocks of data from memory to cache, then you read data from cache to your processor.

3. If the data you want does not exist in cache, cache should re-fetch the data from the memory

When a cache fetches data from memory, locality is important. That is, if you store data sparsely in memory, your cache should fetch data from memory more often. This action corrupts your programs performance because accessing memory is far slower(over 100times!) then accessing cache. The less you access memory, the faster your program. So, this row-major array is more efficient because accessing its data is more likely to be local.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

Row-major vs Column-major confusion

Doesn't matter what you use: just be consistent!

Mismatching the two will !@#$ stuff up

It's incorrect to say code to do matrix multiplication is the same for row major and column major