问题
accumarray
uses two rows of indices to create a matrix with elements on the location of valid index pairs with a value assigned by the specified function, e.g.:
A = [11:20];
B = flipud([11:20]);
C = 1:10;
datamatrix = accumarray([A B],C);
This way datamatrix
will be a 20x20
matrix with values. If the values of A
and B
however are very large, this will result in a mostly empty matrix, with a small batch of data in the far corner. To circumvent this, one might set accumarray
to issparse
:
sparsedatamatrix = accumarray([A B],C,[],@sum,[],true);
This will save a lot of memory in case min(A)
and/or min(B)
is/are very large.
My problem, however, is that I have a Mx7
matrix, with M~1e8
, on which I want to collect the means of columns three through seven based upon indexing in the first two columns and the standard deviation of the third column based upon the third as well:
result = accumarray([data(:,1) data(:,2)],data(:,3),[],@std);
I want to save this back into a table, structured as [X Y Z std R G B I]
, where X
and Y
are the indices, Z
is the average height of that pixel,R
, G
, B
and I
are mean values (colour and intensity) per pixel and std
is the standard deviation of heights (i.e. the roughness). Using the issparse
in this case does not help, since I transform my matrices resulting from accumarray
using repmat
.
The point of this code is to estimate the height, roughness, colour and intensity of a piece of land from a point cloud. I rounded the coordinates in X and Y to create a grid and now need those average values per grid cell, but output as a "table" (not the MATLAB data type, but a 2D array which is not the default matrix output).
So, to conclude with the question:
Is there a way for accumarray
or a similar function to output this table without intermediate (potentially very large) matrix?
Code below:
Xmax = max(Originaldata(:,1));
Ymax = max(Originaldata(:,2));
X_avg_grid=(Edgelength:Edgelength:Xmax*Edgelength)+Xorig;
TestSet = zeros(Xmax*Ymax,9);
xx = [1:length(X_avg_grid)]'; %#ok<*NBRAK>
TestSet(:,1) = repmat(xx,Ymax,1);
ll = 0:Xmax:Xmax*Ymax;
for jj = 1:Ymax
TestSet(ll(jj)+1:ll(jj+1),2) = jj;
end
for ll = 1:7
if ll == 2
tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,3),[],@std);
tempdat = reshape(tempdat,[],1);
TestSet(:,ll+2) = tempdat;
elseif ll == 7
tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],1);
tempdat = reshape(tempdat,[],1);
TestSet(:,ll+2) = tempdat;
elseif ll == 1
tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,3),[],@mean);
tempdat = reshape(tempdat,[],1);
TestSet(:,ll+2) = tempdat;
else
tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,ll+1),[],@mean);
tempdat = reshape(tempdat,[],1);
TestSet(:,ll+2) = tempdat;
end
end
TestSet = TestSet(~(TestSet(:,9)==0),:);
The ninth column here is just the amount of points per cell.
Originaldata =
19 36 2.20500360107422 31488 31488 31488 31611
20 37 2.26400360107422 33792 33792 34304 33924
20 37 2.20000360107422 33536 33536 34048 33667
19 36 2.20500360107422 34560 34560 34560 34695
20 36 2.23300360107422 32512 32512 33024 32639
21 38 2.22000360107422 31744 31488 33024 31611
21 37 2.20400360107422 32512 32768 33792 32896
21 37 2.24800360107422 29696 29440 30720 29555
21 38 2.34800360107422 32768 32768 32768 32639
21 37 2.23000360107422 33024 33024 33536 33153
So all points on the same X,Y (e.g. [19 36]
or [21 37]
) are averaged (height, RGB, intensity in that order) and of the values in the third column the standard deviation is also desired:
Result =
19 36 2.2050036 0.00 33024 33024 33024 33153
21 37 2.227336934 0.02212088 31744 31744 32682.66 31868
and so forth for the rest of the data.
I updated my code to the latest version I have. This reduced memory overhead quite a bit, as the function now creates the grids one after another as opposed to all at once. However, the code is running in parallel so there are still eight simultaneous grids created, so a solution would still be appreciated.
回答1:
A sketch of a solution using linear indices and 2D sparse matrix
lind = Originaldata(:,1) + max( Originaldata(:,1) ) * ( Originaldata(:,2) - 1 );
daccum(7,:) = accumarray( lind, 1, [], @sum, [], true ); %// start with last one to pre-allocate all daccum
daccum(1,:) = accumarray( lind, Originaldata(:,3), [], @mean, [], true );
daccum(2,:) = accumarray( lind, Originaldata(:,3), [], @std, [], true );
daccum(3,:) = accumarray( lind, Originaldata(:,4), [], @mean, [], true );
daccum(4,:) = accumarray( lind, Originaldata(:,5), [], @mean, [], true );
daccum(5,:) = accumarray( lind, Originaldata(:,6), [], @mean, [], true );
daccum(6,:) = accumarray( lind, Originaldata(:,7), [], @mean, [], true );
Now you can get only what you need
inter = [Originaldata(:,1), Originaldata(:,2), full( daccum(:,lind) )' ];
回答2:
You can first use unique with the 'rows'
option to find the indices of the unique pairs of X and Y coordinates, then instead use those indices as the subscript input in your calls to accumarray (you'll have to call it separately for each column, since accumarray
doesn't handle matrix inputs):
[xyPairs, ~, index] = unique(Originaldata(:, 1:2), 'rows');
nPairs = max(index);
Result = [xyPairs ...
accumarray(index, Originaldata(:, 3), [nPairs 1], @mean) ...
accumarray(index, Originaldata(:, 3), [nPairs 1], @std) ...
accumarray(index, Originaldata(:, 4), [nPairs 1], @mean) ...
accumarray(index, Originaldata(:, 5), [nPairs 1], @mean) ...
accumarray(index, Originaldata(:, 6), [nPairs 1], @mean) ...
accumarray(index, Originaldata(:, 7), [nPairs 1], @mean) ...
accumarray(index, ones(size(index)), [nPairs 1], @sum)];
回答3:
You could pre-process the data.
One thing you can achieve this way is remove undesired lines (such as those having two or less occurrences) so that you don't have to deal with 0
standard deviation:
%// Count occurences:
combined_coord = Originaldata(:,1)*1E6+Originaldata(:,2); %// "concatenating" the coords
[C,~,ic] = unique(combined_coord);
occurences = [C accumarray(ic,1)];
%// Find all points that have <=2 occurences:
coords_to_remove = occurences((occurences(:,2)<=2),1);
%// Find valid lines:
valid_lns = ~sum(bsxfun(@eq,combined_coord,coords_to_remove'),2); %'
%// Filter original data:
new_data = Originaldata(valid_lns,:);
来源:https://stackoverflow.com/questions/32182551/letting-accumarray-output-a-table