Difference between distributed and non-distributed arrays in MATLAB

蹲街弑〆低调 提交于 2019-12-04 20:23:27

Distributed arrays are stored on the workers, not the client, and operations on them are carried out in parallel by the workers - that's the point of them.

The difference between distributed and codistributed arrays is only one of perspective. From the point of view of the client, they are distributed arrays; from the point of view of the workers, they are codistributed arrays.

To illustrate, first start a pool:

>> parpool('local',2)

create an array:

>> W = ones(6,6);

W is stored on the client.

Now create a distributed array from W:

>> V = distributed(W);

V is stored on the workers, split across each worker. You still have access to V from the client, but when you do so it is pulling V back from the workers.

>> V
V =
     1     1     1     1     1     1
     1     1     1     1     1     1
     1     1     1     1     1     1
     1     1     1     1     1     1
     1     1     1     1     1     1
     1     1     1     1     1     1 

Note that in the Workspace Browser, V is there as a 6x6 distributed array, not a 6x6 double like W.

Now although from the point of view of the client V is a distributed array, from the point of view of the workers, V is a codistributed array.

>> spmd; disp(V); end
Lab 1: 
          LocalPart: [6x3 double]
      Codistributor: [1x1 codistributor1d]

Lab 2: 
          LocalPart: [6x3 double]
      Codistributor: [1x1 codistributor1d]

You can see that V is codistributed, and that only half of it (6x3) is stored on each worker.

When you do something with V, that happens on the workers, in parallel, and the results are stored on the workers as a distributed/codistributed array:

>> spmd; T = V*2; end
>> spmd; disp(T); end
Lab 1: 
          LocalPart: [6x3 double]
      Codistributor: [1x1 codistributor1d]

Lab 2: 
          LocalPart: [6x3 double]
      Codistributor: [1x1 codistributor1d]

You have access to T from the client just as you did with V, but to explicitly bring it back, use gather:

>> S = gather(T);

Note that S is now a 6x6 double, not a distributed array.

To 1.) There are surely other minor discrepancies, but at least the way you index and manipulate its elements should be the same.

To 2.) You could easily try out for yourself. Anyway, the result is a Composite, that is the normal array way copied to each worker in the execution of the spmd block and the calculation performed multiple times and each result stored. I would use "normal" type for constant input data (parameters) and distributed for variables which are used for computating the output (and defines their size).

Example:

x = distributed(1:100); % variable, output will be calculated on -> distributed
a = 5; % amplitude (constant parameter -> "normal")
spmd
  y = a * sin(x);
end
y

This also explains the purpose of distributed: enable parallel calculation on a matrix.

To 3.: Distributed means its elements are spread over workers. Codistributed means its elements are also spread but in the same way to something that is also distributed (which among others implies equal size). I guess (but are not sure) that the codistributed property stays as long as the parallel pool stays open, but from outside the spmd block they can only be accessed as distributed arrays.

The documentation says:

Codistributed arrays on workers that you create inside spmd statements or from within task functions of communicating job can be accessed as distributed arrays on the client.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!