I am parsing a large text file full of data and then saving it to disk as a *.mat file so that I can easily load in only parts of it (see here for more information on readin
This seems like a bug to me. A workaround is to write in chunks to pre-allocated arrays.
Start off by pre-allocating:
fid = fopen('01_hit12.par', 'r');
data = fread(fid, inf, 'uint8');
nlines = nnz(data == 10) + 1;
fclose(fid);
matObj.moleculeNumber = zeros(1,nlines,'uint8');
matObj.isotopeologueNumber = zeros(1,nlines,'uint8');
matObj.vacuumWavenumber = zeros(1,nlines,'double');
matObj.lineIntensity = zeros(1,nlines,'double');
matObj.airWidth = zeros(1,nlines,'single');
matObj.selfWidth = zeros(1,nlines,'single');
matObj.lowStateE = zeros(1,nlines,'single');
matObj.tempDependWidth = zeros(1,nlines,'single');
matObj.pressureShift = zeros(1,nlines,'single');
Then to write in chunks of 10000, I modified your code as follows:
... % your code plus pre-alloc first
bs = 10000;
while ischar(hitranTemp)
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
for ii = 1:bs,
hitran{ii} = textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6u%2u%2u%2u%2u%2u%2 u%1c%7f%7f','delimiter','','whitespace','');
hitranTemp = fgetl(fidr);
if hitranTemp==-1, bs=ii; break; end
end
% this part really ugly, sorry! trying to keep it compact...
matObj.moleculeNumber(1,k:k+bs-1) = uint8(builtin('_paren',cellfun(@(c)c{1},hitran),1:bs));
matObj.isotopeologueNumber(1,k:k+bs-1) = uint8(builtin('_paren',cellfun(@(c)c{2},hitran),1:bs));
matObj.vacuumWavenumber(1,k:k+bs-1) = builtin('_paren',cellfun(@(c)c{3},hitran),1:bs);
matObj.lineIntensity(1,k:k+bs-1) = builtin('_paren',cellfun(@(c)c{4},hitran),1:bs);
matObj.airWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(@(c)c{5},hitran),1:bs));
matObj.selfWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(@(c)c{6},hitran),1:bs));
matObj.lowStateE(1,k:k+bs-1) = single(builtin('_paren',cellfun(@(c)c{7},hitran),1:bs));
matObj.tempDependWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(@(c)c{8},hitran),1:bs));
matObj.pressureShift(1,k:k+bs-1) = single(builtin('_paren',cellfun(@(c)c{9},hitran),1:bs));
k = k + bs;
fprintf('.');
end
fclose(fidr);
The final size on disk is 21,393,408 bytes. The usage breaks down as,
>> S = whos('-file','01_hit12.mat');
>> fileBytes = sum([S.bytes]);
>> T = dir(which('01_hit12.mat'));
>> diskBytes = T.bytes; ratio = diskBytes/fileBytes;
>> fprintf('%10d whos\n%10d disk\n%10.6f\n',fileBytes,diskBytes,ratio)
8531608 whos
21389582 disk
2.507099
Still fairly inefficient, but not out of control.