Caching Matlab function results to file

前端 未结 3 616
旧巷少年郎
旧巷少年郎 2021-01-20 15:38

I\'m writing a simulation in Matlab. I will eventually run this simulation hundreds of times. In each simulation run, there are millions of simulation cycles. In each of the

相关标签:
3条回答
  • 2021-01-20 16:03

    The answer to both questions is to use a map. There are a few steps to do this.

    1. First you will need a function to turn your bit_array into either a number or a string. For example, turn [0 1 1 0 1 0] into '011010'. (Matlab only supports scalar or string keys, which is why this step is required.)

    2. Defined a map object

      cachedRunMap = containers.Map;  %See edit below for more on this
      
    3. To check if a particular case has been run, use iskey.

      cachedRunMap.isKey('011010');
      
    4. To add the results of a run use the appending syntax

      cachedRunMap('011010') = [0 1 1 0 1];  %Or whatever your result is.  
      
    5. To retrieve cached results, use the getting syntax

      tmpResult = cachedRunMap.values({'011010'});
      

    This should efficiently store and retrieve values until you run out of system memory.


    Putting this together, now your code would look like this:

    %Hacky magic function to convert an array into a string of '0' and '1'
    strFromBits = @(x) char((x(:)'~=0)+48); %'
    
    %Initialize the map
    cachedRunMap = containers.Map;
    
    %Loop, computing and storing results as needed
    for i=1:1000000000
        %pick a bit array somehow
        strKey = strFromBits(bit_array);
        if cachedRunMap.isKey(strKey)
            result = cachedRunMap(strKey);
        else
            result = complex_function(bit_array);
            cachedRunMap(strKey) = reult;
        end
        %do something with result
    end
    

    If you want a key which is not a string, that needs to be declared at step 2. Some examples are:

    cachedRunMap = containers.Map('KeyType', 'char', 'ValueType', 'any');
    cachedRunMap = containers.Map('KeyType', 'double', 'ValueType', 'any');
    cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'any');
    cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'double');
    

    Setting a KeyType of 'char' sets the map to use strings as keys. All other types must be scalars.


    Regarding issues as you scale this up (per your recent comments)

    • Saving data between sessions: There should be no issues saving this map to a *.mat file, up to the limits of your systems memory

    • Purging old data: I am not aware of a straightforward way to add LRU features to this map. If you can find a Java implementation you can use it within Matlab pretty easily. Otherwise it would take some thought to determine the most efficient method of keeping track of the last time a key was used.

    • Sharing data between concurrent sessions: As you indicated, this probably requires a database to perform efficiently. The DB table would be two columns (3 if you want to implement LRU features), the key, value, (and last used time if desired). If your "result" is not a type which easily fits into SQL (e.g. a non-uniform size array, or complex structure) then you will need to put additional thought into how to store it. You will also need a method to access the database (e.g. the database toolbox, or various tools on the Mathworks file exchange). Finally you will need to actually setup a database on a server (e.g. MySql if you are cheap, like me, or whatever you have the most experience with, or can find the most help with.) This is not actually that hard, but it takes a bit of time and effort the first time through.

      Another approach to consider (much less efficient, but not requiring a database) would be to break up the data store into a large (e.g. 1000's or millions) number of maps. Save each into a separate *.mat file, with a filename based on the keys contained in that map (e.g. the first N characters of your string key), and then load/save these files between sessions as needed. This will be pretty slow ... depending on your usage it may be faster to recalculate from the source function each time ... but it's the best way I can think of without setting up the DB (clearly a better answer).

    0 讨论(0)
  • 2021-01-20 16:09
    1. For a large list, a hand-coded binary search can beat ismember, if maintaining it in sorted order isn't too expensive. If that's really your bottleneck. Use the profiler to see how much the ismember is really costing you. If there aren't too many distinct values, you could also store them in a containers.Map by packing the bit_matrix in to a char array and using it as the key.

    2. If it's small enough to fit in memory, you could store it in a MAT file using save and load. They can store any basic Matlab datatype. Have the simulation save the accumulated res and bit_matrix at the end of its run, and re-load them the next time it's called.

    0 讨论(0)
  • 2021-01-20 16:09

    I think that you should use containers.Map() for the purpose of speedup.

    The general idea is to hold a map that contains all hash values. If your bit arrays have uniform distribution under the hash function, most of the time you won't need the call to ismember.

    Since the key type cannot be an array in Matlab, you can calculate some hash function on your array of bits.

    For example:

     function s = GetHash(bitArray)
          s = mod( sum(bitArray), intmax('uint32'));          
     end
    

    This is a lousy hash function, but enough to understand the principle. Then the code would look like:

    map = containers.Map('KeyType','uint32','ValueType','any');
    for i=1:1000000000
        %pick a bit array somehow
        s = GetHash(bit_array);   
        if isKey  %Do the slow check.
            [~,indx] = ismember(bit_array,bit_matrix,'rows');
        else
           map(s) = 1;
           continue;
        end
        if indx == 0
            indx = length(results) + 1;
            bit_matrix(indx,:) = bit_array;
            res(indx) = complex_function(bit_array);
        end
        result = res(indx)
        %do something with result
    end
    
    0 讨论(0)
提交回复
热议问题