Does anyone here know how to delete a variable from a matlab file? I know that you can add variables to an existing matlab file using the save -append
10 GB of data? Updating multi-variable MAT files could get expensive due to MAT format overhead. Consider splitting the data up and saving each variable to a different MAT file, using directories for organization if necessary. Even if you had a convenient function to delete variables from a MAT file, it would be inefficient. The variables in a MAT file are layed out contiguously, so replacing one variable can require reading and writing much of the rest. If they're in separate files, you can just delete the whole file, which is fast.
To see this in action, try this code, stepping through it in the debugger while using something like Process Explorer (on Windows) to monitor its I/O activity.
function replace_vars_in_matfile
x = 1;
% Random dummy data; zeros would compress really well and throw off results
y = randi(intmax('uint8')-1, 100*(2^20), 1, 'uint8');
tic; save test.mat x y; toc;
x = 2;
tic; save -append test.mat x; toc;
y = y + 1;
tic; save -append test.mat y; toc;
On my machine, the results look like this. (Read and Write are cumulative, Time is per operation.)
Read (MB) Write (MB) Time (sec)
before any write: 25 0
first write: 25 105 3.7
append x: 235 315 3.6
append y: 235 420 3.8
Notice that updating the small x variable is more expensive than updating the large y. Much of this I/O activity is "redundant" housekeeping work to keep the MAT file format organized, and will go away if each variable is in its own file.
Also, try to keep these files on the local filesystem; it'll be a lot faster than network drives. If they need to go on a network drive, consider doing the save() and load() on local temp files (maybe chosen with tempname()) and then copying them to/from the network drive. Matlab's save and load tend to be much faster with local filesystems, enough so that local save/load plus a copy can be a substantial net win.
Here's a basic implementation that will let you save variables to separate files using the familiar save() and load() signatures. They're prefixed with "d" to indicate they're the directory-based versions. They use some tricks with evalin() and assignin(), so I thought it would be worth posting the full code.
function dsave(file, varargin)
%DSAVE Like save, but each var in its own file
%
% dsave filename var1 var2 var3...
if nargin < 1 || isempty(file); file = 'matlab'; end
[tfStruct,loc] = ismember({'-struct'}, varargin);
args = varargin;
args(loc(tfStruct)) = [];
if ~all(cellfun(@isvarname, args))
error('Invalid arguments. Usage: dsave filename <-struct> var1 var2 var3 ...');
end
if tfStruct
structVarName = args{1};
s = evalin('caller', structVarName);
else
varNames = args;
if isempty(args)
w = evalin('caller','whos');
varNames = { w.name };
end
captureExpr = ['struct(' ...
join(',', cellfun(@(x){sprintf('''%s'',{%s}',x,x)}, varNames)) ')'];
s = evalin('caller', captureExpr);
end
% Use Java checks to avoid partial path ambiguity
jFile = java.io.File(file);
if ~jFile.exists()
ok = mkdir(file);
if ~ok;
error('failed creating dsave dir %s', file);
end
elseif ~jFile.isDirectory()
error('Cannot save: destination exists but is not a dir: %s', file);
end
names = fieldnames(s);
for i = 1:numel(names)
varFile = fullfile(file, [names{i} '.mat']);
varStruct = struct(names{i}, {s.(names{i})});
save(varFile, '-struct', 'varStruct');
end
function out = join(Glue, Strings)
Strings = cellstr(Strings);
if length( Strings ) == 0
out = '';
elseif length( Strings ) == 1
out = Strings{1};
else
Glue = sprintf( Glue ); % Support escape sequences
out = strcat( Strings(1:end-1), { Glue } );
out = [ out{:} Strings{end} ];
end
Here's the load() equivalent.
function out = dload(file,varargin)
%DLOAD Like load, but each var in its own file
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
if ~exist(file, 'dir')
error('Not a dsave dir: %s', file);
end
if isempty(varNames)
d = dir(file);
varNames = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
end
out = struct;
for i = 1:numel(varNames)
name = varNames{i};
tmp = load(fullfile(file, [name '.mat']));
out.(name) = tmp.(name);
end
if nargout == 0
for i = 1:numel(varNames)
assignin('caller', varNames{i}, out.(varNames{i}));
end
clear out
end
Dwhos() is the equivalent of whos('-file').
function out = dwhos(file)
%DWHOS List variable names in a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
out = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
And ddelete() to delete the individual variables like you asked.
function ddelete(file,varargin)
%DDELETE Delete variables from a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
for i = 1:numel(varNames)
delete(fullfile(file, [varNames{i} '.mat']));
end