Cleanest way to cache function results in MATLAB

后端 未结 3 1437
轻奢々
轻奢々 2020-12-31 07:30

I have quite a heavy function in MATLAB:

function [out] = f ( in1, in2, in3)

Which is called quite often with the same parameters. The func

相关标签:
3条回答
  • 2020-12-31 08:03

    MATLAB now ships with a function just for this purpose. The technique used is called "memoization" and the function's name is "memoize".

    Check out : https://www.mathworks.com/help/matlab/ref/memoize.html

    0 讨论(0)
  • 2020-12-31 08:20

    Below is an idea for a CacheableFunction class

    • It seems all of the answers to your main question are pointing the same direction - a persistent Map is the consensus way to cache results, and I do this too.
    • If the inputs are arrays, they'll need to be hashed to a string or scalar to be used as a map key. There are a lot of ways to hash your 3 input arrays to a key, I used DataHash in my solution below.
    • I chose to make it a class rather than a function like memoize so that the input hashing function can be dynamically specified one time, rather than hardcoded.
    • Depending on the form of your output, it also uses dzip/dunzip to reduce the footprint of the saved outputs.
    • Potential improvement: a clever way of deciding which elements to remove from the persistent map when its memory footprint approaches some limit.

    Class definition

    classdef CacheableFunction < handle
        properties
            exeFun
            hashFun
            cacheMap
            nOutputs
            zipOutput
        end
    
        methods
            function obj = CacheableFunction(exeFun, hashFun, nOutputs)
                obj.exeFun = exeFun;
                obj.hashFun = hashFun;
                obj.cacheMap = containers.Map;
                obj.nOutputs = nOutputs;
                obj.zipOutput = [];
            end
    
            function [result] = evaluate(obj, varargin)
    
                thisKey = obj.hashFun(varargin);
    
                if isKey(obj.cacheMap, thisKey)
                    if obj.zipOutput
                        result = cellfun(@(x) dunzip(x), obj.cacheMap(thisKey), 'UniformOutput', false);
                    else
                        result = obj.cacheMap(thisKey);
                    end
                else
                    [result{1:obj.nOutputs}] = obj.exeFun(varargin);
    
                    if isempty(obj.zipOutput)
                        obj.zipCheck(result);
                    end
    
                    if obj.zipOutput
                        obj.cacheMap(thisKey) = cellfun(@(x) dzip(x), result, 'UniformOutput', false);
                    else
                        obj.cacheMap(thisKey) = result;
                    end
                end
            end
    
    
            function [] = zipCheck(obj,C)
                obj.zipOutput = all(cellfun(@(x) isreal(x) & ~issparse(x) & any(strcmpi(class(x), ...
                    {'double','single','logical','char','int8','uint8',...
                     'int16','uint16','int32','uint32','int64','uint64'})), C));
            end
    
        end
    end
    

    Testing it out...

    function [] = test_caching_perf()
    
    A = CacheableFunction(@(x) long_annoying_function(x{:}), @(x) DataHash(x), 3);
    
    B = rand(50, 50);
    C = rand(50, 50);
    D = rand(50, 50);
    
    tic;
    myOutput = A.evaluate(B, C, D);
    toc
    
    tic;
    myOutput2 = A.evaluate(B, C, D);
    toc
    
    cellfun(@(x, y) all(x(:) == y(:)), myOutput, myOutput2)
    
    end
    
    function [A, B, C] = long_annoying_function(A, B, C)
    
        for ii = 1:5000000
            A = A+1;
            B = B+2;
            C = C+3;
        end
    end
    

    And results

    >> test_caching_perf
    Elapsed time is 16.781889 seconds.
    Elapsed time is 0.011116 seconds.
    ans =
        1     1     1
    
    0 讨论(0)
  • 2020-12-31 08:21

    Persistent map is indeed a nice way to implement cached results. Advantages I can think of:

    • No need to implement hash function for every data type.
    • Matlab matrices are copy-on-write, which can offer certain memory efficiency.
    • If memory usage is an issue, one can control how many results to cache.

    There is a file exchange submission, A multidimensional map class by David Young, comes with a function memoize() does exactly this. It's implementation uses a bit different mechanism (referenced local variable), but the idea is about the same. Compared with persistent map inside each function, this memoize() function allows existing function to be memoized without modification. And as pointed out by Oleg, using DataHash (or equivalent) can further reduce the memory usage.

    PS: I have used the MapN class extensively and it is quite reliable. Actually I have submitted a bug report and the author fixed it promptly.

    0 讨论(0)
提交回复
热议问题