问题
I am doing parallel computations with MATALB parfor
. The code structure looks pretty much like
%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
mid_fitnesses = zeros(1, numel(new_indi_idices));
right_fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
bitmap = bitmaps{idx};
if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
[mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
mid_fitness = 100+mid_dsp;
right_fitness = 100+right_dsp;
else % porosity not even qualified
mid_fitness = 0;
right_fitness = 0;
end
mid_fitnesses(idx) = mid_fitness;
right_fitnesses(idx) = right_fitness;
fprintf('Done.\n');
pause(0.01); % for break
end
I encountered the following weird error.
Error using parallel.internal.pool.deserialize (line 9)
Bad version or endian-key
Error in distcomp.remoteparfor/getCompleteIntervals (line 141)
origErr =
parallel.internal.pool.deserialize(intervalError);
Error in nsga2 (line 57)
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
How should I fix it? A quick Google search returns no solution.
Update 1
The weirder thing is the following snippet works perfectly under the exactly same settings and the same HPC. I think there might be some subtle differences between them two, causing one to work and the other to fail. The working snippet:
%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
bitmap = bitmaps{idx};
if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
displacement = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
fitness = 100+displacement;
else % porosity not even qualified
fitness = 0;
end
fitnesses(idx) = fitness;
%fprintf('Done.\n', gen, idx);
pause(0.01); % for break
end
pop(3, new_indi_idices) = num2cell(fitnesses);
Update 2
Suspecting [mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
causes me trouble, I replace it with
mid_dsp = rand();
right_dsp = rand();
Then, it works! This proves that this is indeed caused by this particular line. However, I do have tested the function, and it returns two numbers correctly! Since the function returns value just as rand()
does, I can't see any difference. This confuses me more.
回答1:
I had the same issue and it came out that Matlab 2015 is reserving all necessary memory resources for each of the loops in the parfor resulting in memory break shortage. The error message is tricky. After fine tuning the code in the loop and providing 120GB of RAM from the SSD through system setting in Pagefile in Windows 10, the parfor executed beautifully.
回答2:
After working a while on my own similar code block, I've decided that this is actually a memory issue.
I'm using a 6 core 4GHz CPU and 8 gigs of RAM and seen this issue (on MATLAB 2014b) when I set the worker count high, and did not have any problems with low worker counts.
When I use 6 or more workers (which is not ideal I know), memory consumption is high and this error message pops out sporadically. Also I have seen various out of memory errors in my tests.
I havent seen the error when I use 5 or less workers thus far, and I'm pretty sure some memory limit (possibly inside a java code block) is causing this issue by preventing some of the results' integrity (or existance)
Hope you can resolve this issue by reducing the worker count.
来源:https://stackoverflow.com/questions/24592602/bad-version-or-endian-key-in-matlab-parfor