Mathematica fast 2D binning algorithm

邮差的信 提交于 2019-11-30 02:21:46

Here is a method based on Szabolcs's post that is about about an order of magnitude faster.

data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];

epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];

res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]}, 
    SparseArray[
     gb[[All, 1, 1]] -> 
      Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming

Gives about {2.012217, Null}

AbsoluteTiming[
 System`SetSystemOptions[ 
  "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
 res3 = SparseArray[indexes -> zvalues];
 System`SetSystemOptions[ 
  "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
 ]

Gives about {0.195228, Null}

res3 == res2
True

"TreatRepeatedEntries" -> 1 adds duplicate positions up.

I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round, Floor, or Ceiling (with a second argument) in place of Nearest, the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.


Assuming I understand your requirements, I propose:

data = RandomReal[100, {75, 3}];

bins = {0, 20, 40, 60, 80, 100};

Reap[
  Sow[{#3, #2}, bins ~Nearest~ #] & @@@ data,
  bins,
  Reap[Sow[#, bins ~Nearest~ #2] & @@@ #2, bins, Tr@#2 &][[2]] &
][[2]] ~Flatten~ 1 ~Total~ {3} // MatrixForm

Refactored:

f[bins_] := Reap[Sow[{##2}, bins ~Nearest~ #]& @@@ #, bins, #2][[2]] &

bin2D[data_, X_, Y_] := f[X][data, f[Y][#2, #2~Total~2 &] &] ~Flatten~ 1 ~Total~ {3}

Use:

bin2D[data, xbins, ybins]

Here's my approach:

data = RandomReal[5, {500000, 3}]; (* 500k values *)

zvalues = data[[All, 3]];

epsilon = 1*^-10; (* prevent 101 index *)

(* rescale and round (x,y) coordinates to index pairs in the 1..100 range *)    
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];

(* approach 1: create bin-matrix first, then fill up elements by adding  zvalues *)
res1 = Module[
    {result = ConstantArray[0, {100, 100}]},
    Do[
      AddTo[result[[##]], zvalues[[i]]] & @@ indexes[[i]], 
      {i, Length[indexes]}
    ];
    result
    ]; // Timing

(* approach 2: gather zvalues by indexes, add them up, convert them to a matrix *)
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
    SparseArray[gb[[All, 1, 1]] -> (Total /@ gb[[All, All, 2]])]
    ]; // Timing

res1 == res2

These two approaches (res1 & res2) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?

faysou

Here's my approach using the function SelectEquivalents defined in What is in your Mathematica tool bag? which is perfect for a problem like this one.

data = RandomReal[100, {75, 3}];
bins = Range[0, 100, 20];
binMiddles = (Most@bins + Rest@bins)/2;
nearest = Nearest[binMiddles];

SelectEquivalents[
   data
   ,
   TagElement -> ({First@nearest[#[[1]]], First@nearest[#[[2]]]} &)
   ,
   TransformElement -> (#[[3]] &)
   ,
   TransformResults -> (Total[#2] &)
   ,
   TagPattern -> Flatten[Outer[List, binMiddles, binMiddles], 1]
   , 
   FinalFunction -> (Partition[Flatten[# /. {} -> 0], Length[binMiddles]] &)
]

If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).

InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!