问题
Many times I find myself counting occurrences with Tally[ ]
and then, once I discarded the original list, having to add (and join) to that counters list the results from another list.
This typically happens when I am counting configurations, occurrences, doing some discrete statistics, etc.
So I defined a very simple but handy function for Tally aggregation:
aggTally[listUnTallied__List:{},
listUnTallied1_List,
listTallied_List] :=
Join[Tally@Join[listUnTallied, listUnTallied1], listTallied] //.
{a___, {x_, p_}, b___, {x_, q_}, c___} -> {a, {x, p + q}, b, c};
Such that
l = {x, y, z}; lt = Tally@l;
n = {x};
m = {x, y, t};
aggTally[n, {}]
{{x, 1}}
aggTally[m, n, {}]
{{x, 2}, {y, 1}, {t, 1}}
aggTally[m, n, lt]
{{x, 3}, {y, 2}, {t, 1}, {z, 1}}
This function has two problems:
1) Performance
Timing[Fold[aggTally[Range@#2, #1] &, {}, Range[100]];]
{23.656, Null}
(* functional equivalent to *)
Timing[s = {}; j = 1; While[j < 100, s = aggTally[Range@j, s]; j++]]
{23.047, Null}
2) It does not validate that the last argument is a real Tallied list or null (less important for me, though)
Is there a simple, elegant, faster and more effective solution? (I understand that these are too many requirements, but wishing is free)
回答1:
Perhaps, this will suit your needs?
aggTallyAlt[listUnTallied__List : {}, listUnTallied1_List, listTallied : {{_, _Integer} ...}] :=
{#[[1, 1]], Total@#[[All, 2]]} & /@
GatherBy[Join[Tally@Join[listUnTallied, listUnTallied1], listTallied], First]
The timings are much better, and there is a pattern-based check on the last arg.
EDIT:
Here is a faster version:
aggTallyAlt1[listUnTallied__List : {}, listUnTallied1_List, listTallied : {{_, _Integer} ...}] :=
Transpose[{#[[All, 1, 1]], Total[#[[All, All, 2]], {2}]}] &@
GatherBy[Join[Tally@Join[listUnTallied, listUnTallied1], listTallied], First]
The timings for it:
In[39]:= Timing[Fold[aggTallyAlt1[Range@#2, #1] &, {}, Range[100]];]
Timing[s = {}; j = 1; While[j < 100, s = aggTallyAlt1[Range@j, s]; j++]]
Out[39]= {0.015, Null}
Out[40]= {0.016, Null}
回答2:
The following solution is just a small modification of your original function. It applies Sort
before using ReplaceRepeated
and can thus use a less general replacement pattern which makes it much faster:
aggTally[listUnTallied__List : {}, listUnTallied1_List,
listTallied : {{_, _Integer} ...}] :=
Sort[Join[Tally@Join[listUnTallied, listUnTallied1],
listTallied]] //. {a___, {x_, p_}, {x_, q_}, c___} -> {a, {x, p + q}, c};
回答3:
Here's the fastest thing I've come up with yet, (ab)using the tagging available with Sow
and Reap
:
aggTally5[untallied___List, tallied_List: {}] :=
Last[Reap[
Scan[((Sow[#2, #] &) @@@ Tally[#]) &, {untallied}];
Sow[#2, #] & @@@ tallied;
, _, {#, Total[#2]} &]]
Not going to win any beauty contests, but it's all about speed, right? =)
回答4:
If you stay purely symbolic, you may try something along the lines of
(Plus @@ Times @@@ Join[#1, #2] /. Plus -> List /. Times -> List) &
for joining tally lists. This is stupid fast but returns something that isn't a tally list, so it needs some work (after which it may not be so fast anymore ;) ).
EDIT: So I've got a working version:
aggT = Replace[(Plus @@ Times @@@ Join[#1, #2]
/. Plus -> List
/. Times[a_, b_] :> List[b, a]),
k_Symbol -> List[k, 1], {1}] &;
Using a couple of random symbolic tables I get
a := Tally@b;
b := Table[f[RandomInteger@99 + 1], {i, 100}];
Timing[Fold[aggT[#1, #2] &, a, Table[a, {i, 100}]];]
--> {0.104954, Null}
This version only adds tally lists, doesn't check anything, still returns some integers, and comparing to Leonid's function:
Timing[Fold[aggTallyAlt1[#2, #1] &, a, Table[b, {i, 100}]];]
--> {0.087039, Null}
it's already a couple of seconds slower :-(.
Oh well, nice try.
来源:https://stackoverflow.com/questions/5143575/aggregating-tally-counters