deleting variables from a .mat file

前端 未结 4 1674
無奈伤痛
無奈伤痛 2021-01-04 02:05

Does anyone here know how to delete a variable from a matlab file? I know that you can add variables to an existing matlab file using the save -append

4条回答
  •  时光说笑
    2021-01-04 02:22

    10 GB of data? Updating multi-variable MAT files could get expensive due to MAT format overhead. Consider splitting the data up and saving each variable to a different MAT file, using directories for organization if necessary. Even if you had a convenient function to delete variables from a MAT file, it would be inefficient. The variables in a MAT file are layed out contiguously, so replacing one variable can require reading and writing much of the rest. If they're in separate files, you can just delete the whole file, which is fast.

    To see this in action, try this code, stepping through it in the debugger while using something like Process Explorer (on Windows) to monitor its I/O activity.

    function replace_vars_in_matfile
    
    x = 1;
    % Random dummy data; zeros would compress really well and throw off results
    y = randi(intmax('uint8')-1, 100*(2^20), 1, 'uint8');
    
    tic; save test.mat x y; toc;
    x = 2;
    tic; save -append test.mat x; toc;
    y = y + 1;
    tic; save -append test.mat y; toc;
    

    On my machine, the results look like this. (Read and Write are cumulative, Time is per operation.)

                        Read (MB)      Write (MB)       Time (sec)
    before any write:   25             0
    first write:        25             105              3.7
    append x:           235            315              3.6
    append y:           235            420              3.8
    

    Notice that updating the small x variable is more expensive than updating the large y. Much of this I/O activity is "redundant" housekeeping work to keep the MAT file format organized, and will go away if each variable is in its own file.

    Also, try to keep these files on the local filesystem; it'll be a lot faster than network drives. If they need to go on a network drive, consider doing the save() and load() on local temp files (maybe chosen with tempname()) and then copying them to/from the network drive. Matlab's save and load tend to be much faster with local filesystems, enough so that local save/load plus a copy can be a substantial net win.


    Here's a basic implementation that will let you save variables to separate files using the familiar save() and load() signatures. They're prefixed with "d" to indicate they're the directory-based versions. They use some tricks with evalin() and assignin(), so I thought it would be worth posting the full code.

    function dsave(file, varargin)
    %DSAVE Like save, but each var in its own file
    %
    % dsave filename var1 var2 var3...
    if nargin < 1 || isempty(file); file = 'matlab';  end
    [tfStruct,loc] = ismember({'-struct'}, varargin);
    args = varargin;
    args(loc(tfStruct)) = [];
    if ~all(cellfun(@isvarname, args))
        error('Invalid arguments. Usage: dsave filename <-struct> var1 var2 var3 ...');
    end
    if tfStruct
        structVarName = args{1};
        s = evalin('caller', structVarName);
    else
        varNames = args;
        if isempty(args)
            w = evalin('caller','whos');
            varNames = { w.name };
        end
        captureExpr = ['struct(' ...
            join(',', cellfun(@(x){sprintf('''%s'',{%s}',x,x)}, varNames)) ')'];
        s = evalin('caller', captureExpr);
    end
    
    % Use Java checks to avoid partial path ambiguity
    jFile = java.io.File(file);
    if ~jFile.exists()
        ok = mkdir(file);
        if ~ok; 
            error('failed creating dsave dir %s', file);
        end
    elseif ~jFile.isDirectory()
        error('Cannot save: destination exists but is not a dir: %s', file);
    end
    names = fieldnames(s);
    for i = 1:numel(names)
        varFile = fullfile(file, [names{i} '.mat']);
        varStruct = struct(names{i}, {s.(names{i})});
        save(varFile, '-struct', 'varStruct');
    end
    
    function out = join(Glue, Strings)
    Strings = cellstr(Strings);
    if length( Strings ) == 0
        out = '';
    elseif length( Strings ) == 1
        out = Strings{1};
    else
        Glue = sprintf( Glue ); % Support escape sequences
        out = strcat( Strings(1:end-1), { Glue } );
        out = [ out{:} Strings{end} ];
    end
    

    Here's the load() equivalent.

    function out = dload(file,varargin)
    %DLOAD Like load, but each var in its own file
    if nargin < 1 || isempty(file); file = 'matlab'; end
    varNames = varargin;
    if ~exist(file, 'dir')
        error('Not a dsave dir: %s', file);
    end
    if isempty(varNames)
        d = dir(file);
        varNames = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
    end
    
    out = struct;
    for i = 1:numel(varNames)
        name = varNames{i};
        tmp = load(fullfile(file, [name '.mat']));
        out.(name) = tmp.(name);
    end
    
    if nargout == 0
        for i = 1:numel(varNames)
            assignin('caller', varNames{i}, out.(varNames{i}));
        end
        clear out
    end
    

    Dwhos() is the equivalent of whos('-file').

    function out = dwhos(file)
    %DWHOS List variable names in a dsave dir
    if nargin < 1 || isempty(file); file = 'matlab'; end
    out = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
    

    And ddelete() to delete the individual variables like you asked.

    function ddelete(file,varargin)
    %DDELETE Delete variables from a dsave dir
    if nargin < 1 || isempty(file); file = 'matlab'; end
    varNames = varargin;
    for i = 1:numel(varNames)
        delete(fullfile(file, [varNames{i} '.mat']));
    end
    

提交回复
热议问题