There is a strange bug that has been in Mathematica for years, at least since version 5.1, and persisting through version 7.
Module[{f, L}, L = f[];
Do[L
As to version 8: I tried 100,000 random strings of various lengths and didn't find anything out of the ordinary.
chars = StringCases[CharacterRange["A", "z"], WordCharacter] //Flatten;
res = Table[
ToExpression[
StringReplace[
"First[AbsoluteTiming[Module[{weirdness,L},L=weirdness[];\
\[IndentingNewLine]Do[L=weirdness[L,i],{i,10^4}]]]]",
"weirdness" ->
StringJoin[
RandomChoice[chars, RandomInteger[{1, 20}]]]]], {100000}];
What causes this? Is it a hashing problem?
Yes, more or less.
Is it fixed in Version 8?
Yes (also more or less). That is to say, it is not possible to fix in any "complete" sense. But the most common cases are much better handled.
Is there a way to know what symbol names cause a slowdown, other than testing?
No way of which I am aware.
In version 7 there is an earlier fix of a similar nature to the one in version 8. It was off by default (we'd not had adequate time to test it when we shipped, and it did not get turned on for version 7.0.1). It can be accessed as follows.
SetSystemOptions["NeedNotReevaluateOptions"->{"UseSymbolLists"->True}];
This brings your example back to the realm of the reasonable.
Module[{weirdness, L}, L = weirdness[];
Do[L = weirdness[L, i], {i, 10^4}]] // Timing
Out[8]= {0.020997, Null}
---edit---
I can explain the optimization involved here in slightly more detail. First recall that Mathematica emulates "infinite evaluation", that is, expressions keep evaluating until they no longer change. This can be costly and hence requires careful short circuit optimizations to forestall it when possible.
A mechanism we use is a variant of hashing, that serves to indicate that symbols on which an expression might depend are unchanged and hence that expression is unchanged. It is here that collisions might occur, thus necessitating more work.
In a bad case, the Mathematica kernel might need to walk the entire expression in order to determine that it is unchanged. This walk can be as costly as reevaluation. An optimization, new to version 7 (noted above), is to record explicitly, for some types of expression, those symbols upon which it depends. Then the reevaluation check can be shortened by simply checking that none of these symbols has been changed since the last time the expression was evaluated.
The implementation details are a bit involved (and also a bit proprietary, though perhaps not so hard to guess). But that, in brief, is what is going on under the hood. Earlier versions sometimes did significant expression traversal just to discover that the expression needed no reevaluation. This can still happen, but it is a much more rare event now.
---end edit---
Daniel Lichtblau Wolfram Research