Efficient alternative to Outer on sparse arrays in Mathematica?

前端 未结 3 1220
挽巷
挽巷 2021-02-08 23:45

Suppose I have two very large lists {a1, a2, …} and {b1, b2, …} where all ai and bj are large sparse arrays. For the sake of memory efficiency I store each list as one comprehen

相关标签:
3条回答
  • 2021-02-09 00:25

    If lst1 and lst2 are your lists,

    Reap[
       Do[Sow[f[#1[[i]], #2[[j]]]],
           {i, 1, Length@#1},
           {j, 1, Length@#2}
           ] &[lst1, lst2];
       ] // Last // Last
    

    does the job and may be more memory-efficient. On the other hand, maybe not. Nasser is right, an explicit example would be useful.

    EDIT: Using Nasser's randomly-generated arrays, and for len=200, MaxMemoryUsed[] indicates that this form needs 170MB while the Outer form in the question takes 435MB.

    0 讨论(0)
  • 2021-02-09 00:35

    I will propose a solution which is rather complex but allows one to only use about twice as much memory during the computation as is needed to store the final result as a SparseArray. The price to pay for this will be a much slower execution.

    The code

    Sparse array construction / deconstruction API

    Here is the code. First, a slightly modified (to address higher-dimensional sparse arrays) sparse array construction - deconstruction API, taken from this answer:

    ClearAll[spart, getIC, getJR, getSparseData, getDefaultElement, 
      makeSparseArray];
    HoldPattern[spart[SparseArray[s___], p_]] := {s}[[p]];
    getIC[s_SparseArray] := spart[s, 4][[2, 1]];
    getJR[s_SparseArray] := spart[s, 4][[2, 2]];
    getSparseData[s_SparseArray] := spart[s, 4][[3]];
    getDefaultElement[s_SparseArray] := spart[s, 3];
    makeSparseArray[dims_List, jc_List, ir_List, data_List, defElem_: 0] :=
       SparseArray @@ {Automatic, dims, defElem, {1, {jc, ir}, data}};
    

    Iterators

    The following functions produce iterators. Iterators are a good way to encapsulate the iteration process.

    ClearAll[makeTwoListIterator];
    makeTwoListIterator[fname_Symbol, a_List, b_List] :=
      With[{indices = Flatten[Outer[List, a, b, 1], 1]},
       With[{len  = Length[indices]},
        Module[{i = 0},
          ClearAll[fname];
          fname[] := With[{ind = ++i}, indices[[ind]] /; ind <= len];
          fname[] := Null;
          fname[n_] := 
            With[{ind = i + 1}, i += n; 
               indices[[ind ;; Min[len, ind + n - 1]]] /; ind <= len];
          fname[n_] := Null;
     ]]];
    

    Note that I could have implemented the above function more memory - efficiently and not use Outer in it, but for our purposes this won't be the major concern.

    Here is a more specialized version, which produces interators for pairs of 2-dimensional indices.

    ClearAll[make2DIndexInterator];
    make2DIndexInterator[fname_Symbol, i : {iStart_, iEnd_}, j : {jStart_, jEnd_}] :=
       makeTwoListIterator[fname, Range @@ i, Range @@ j];
     make2DIndexInterator[fname_Symbol, ilen_Integer, jlen_Integer] :=
       make2DIndexInterator[fname, {1, ilen}, {1, jlen}];
    

    Here is how this works:

    In[14]:= 
    makeTwoListIterator[next,{a,b,c},{d,e}];
    next[]
    next[]
    next[]
    
    Out[15]= {a,d}
    Out[16]= {a,e}
    Out[17]= {b,d}
    

    We can also use this to get batch results:

    In[18]:= 
    makeTwoListIterator[next,{a,b,c},{d,e}];
    next[2]
    next[2]
    
    Out[19]= {{a,d},{a,e}}
    Out[20]= {{b,d},{b,e}}
    

    , and we will be using this second form.

    SparseArray - building function

    This function will build a SparseArray object iteratively, by getting chunks of data (also in SparseArray form) and gluing them together. It is basically code used in this answer, packaged into a function. It accepts the code piece used to produce the next chunk of data, wrapped in Hold (I could alternatively make it HoldAll)

    Clear[accumulateSparseArray];
    accumulateSparseArray[Hold[getDataChunkCode_]] :=
      Module[{start, ic, jr, sparseData, dims, dataChunk},
       start = getDataChunkCode;
       ic = getIC[start];
       jr = getJR[start];
       sparseData = getSparseData[start];
       dims = Dimensions[start];
       While[True, dataChunk = getDataChunkCode;
         If[dataChunk === {}, Break[]];
         ic = Join[ic, Rest@getIC[dataChunk] + Last@ic];
         jr = Join[jr, getJR[dataChunk]];
         sparseData = Join[sparseData, getSparseData[dataChunk]];
         dims[[1]] += First[Dimensions[dataChunk]];
       ];
       makeSparseArray[dims, ic, jr, sparseData]];
    

    Putting it all together

    This function is the main one, putting it all together:

    ClearAll[sparseArrayOuter];
    sparseArrayOuter[f_, a_SparseArray, b_SparseArray, chunkSize_: 100] := 
      Module[{next, wrapperF, getDataChunkCode},
        make2DIndexInterator[next, Length@a, Length@b];
        wrapperF[x_List, y_List] := SparseArray[f @@@ Transpose[{x, y}]];
        getDataChunkCode :=
          With[{inds = next[chunkSize]},
             If[inds === Null, Return[{}]];
             wrapperF[a[[#]] & /@ inds[[All, 1]], b[[#]] & /@ inds[[All, -1]]]
          ];
        accumulateSparseArray[Hold[getDataChunkCode]]
      ];
    

    Here, we first produce the iterator which will give us on demand portions of index pair list, used to extract the elements (also SparseArrays). Note that we will generally extract more than one pair of elements from two large input SparseArray-s at a time, to speed up the code. How many pairs we process at once is governed by the optional chunkSize parameter, which defaults to 100. We then construct the code to process these elements and put the result back into SparseArray, where we use an auxiliary function wrapperF. The use of iterators wasn't absolutely necessary (could use Reap-Sow instead, as with other answers), but allowed me to decouple the logic of iteration from the logic of generic accumulation of sparse arrays.

    Benchmarks

    First we prepare large sparse arrays and test our functionality:

    In[49]:= 
    arr = {SparseArray[{{1,1,1,1}->1,{2,2,2,2}->1}],SparseArray[{{1,1,1,2}->1,{2,2,2,1}->1}],
    SparseArray[{{1,1,2,1}->1,{2,2,1,2}->1}],SparseArray[{{1,1,2,2}->-1,{2,2,1,1}->1}],
    SparseArray[{{1,2,1,1}->1,{2,1,2,2}->1}],SparseArray[{{1,2,1,2}->1,{2,1,2,1}->1}]};
    
    In[50]:= list=SparseArray[arr]
    Out[50]= SparseArray[<12>,{6,2,2,2,2}]
    
    In[51]:= larger = sparseArrayOuter[Dot,list,list]
    Out[51]= SparseArray[<72>,{36,2,2,2,2,2,2}]
    
    In[52]:= (large= sparseArrayOuter[Dot,larger,larger])//Timing
    Out[52]= {0.047,SparseArray[<2592>,{1296,2,2,2,2,2,2,2,2,2,2}]}
    
    In[53]:= SparseArray[Flatten[Outer[Dot,larger,larger,1],1]]==large
    Out[53]= True
    
    In[54]:= MaxMemoryUsed[]
    Out[54]= 21347336
    

    Now we do the power tests

    In[55]:= (huge= sparseArrayOuter[Dot,large,large,2000])//Timing
    Out[55]= {114.344,SparseArray[<3359232>,{1679616,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}]}
    
    In[56]:= MaxMemoryUsed[]
    Out[56]= 536941120
    
    In[57]:= ByteCount[huge]
    Out[57]= 262021120
    
    In[58]:= (huge1 = Flatten[Outer[Dot,large,large,1],1]);//Timing
    Out[58]= {8.687,Null}
    
    In[59]:= MaxMemoryUsed[]
    Out[59]= 2527281392
    

    For this particular example, the suggested method is 5 times more memory-efficient than the direct use of Outer, but about 15 times slower. I had to tweak the chunksize parameter (default is 100, but for the above I used 2000, to get the optimal speed / memory use combination). My method only used as a peak value twice as much memory as needed to store the final result. The degree of memory-savings as compared to Outer- based method will depend on the sparse arrays in question.

    0 讨论(0)
  • 2021-02-09 00:39

    Using your example list data, I believe that you will find the ability to Append to a SparseArray quite helpful.

    acc = SparseArray[{}, {1, 2, 2, 2, 2, 2, 2}]
    
    Do[AppendTo[acc, i.j], {i, list}, {j, list}]
    
    Rest[acc]
    

    I need Rest to drop the first zero-filled tensor in the result. The second argument of the seed SparseArray must be the dimensions of each of your elements with a prefixed 1. You may need to explicitly specify a background for the seed SparseArray to optimize performance.

    0 讨论(0)
提交回复
热议问题