Why is the first run always much slower?

喜欢而已 提交于 2019-12-05 13:04:23

The first run always takes significantly more time than subsequent ones. Why is that so?

There's another tricky dependency factoring into your benchmark results: I/O. Try a few test runs that return the timing vectors rather than print them, and you should see results more in line with this:

(for [_ (range 10)]
  (times 10 (Thread/yield)))
=>
([32674 1539 1068 1063 1027 1026 1025 1031 1034 1035]
 [1335 1048 1030 1036 1043 1037 1036 1031 1034 1047]
 [1088 1043 1029 1035 1045 1035 1036 1035 1045 1047]
 [1051 1037 1032 1031 1048 1045 1039 1045 1042 1037]
 [1054 1048 1032 1036 1046 1029 1038 1038 1039 1051]
 [1050 1051 1039 1037 1038 1035 1030 1030 1045 1031]
 [1054 1045 1034 1034 1045 1037 1037 1035 1046 1044]
 [1051 1041 1032 1050 1061 1039 1045 1041 1057 1034]
 [1052 1042 1034 1032 1035 1045 1043 1038 1052 1052]
 [1053 1053 1041 1043 1053 1044 1039 1042 1051 1038])

If you use System.out.println in your benchmark instead of prn, you should see the same slow-down behavior but much less exaggerated:

(dotimes [x 10]
  (.println System/out (times 10 (Thread/yield))))
=> nil
[33521 1733 1232 1161 1150 1135 1151 1138 1143 1144]
[1724 1205 1149 1152 1141 1149 1149 1150 1139 1145]
[1368 1156 1141 1139 1147 1149 1141 1147 1141 1149]
[1306 1159 1150 1141 1150 1148 1147 1142 1144 1149]
[1329 1161 1155 1144 1140 1155 1151 1149 1149 1140]
[1319 1154 1140 1143 1147 1154 1156 1149 1148 1145]
[1291 1166 1164 1149 1140 1150 1140 1152 1141 1139]
[4482 1194 1148 1150 1137 1165 1163 1154 1149 1152]
[1333 1184 1162 1163 1138 1149 1150 1151 1137 1145]
[1318 1150 1144 1150 1151 1147 1138 1147 1143 1149]

You can see this effect even with a much less expensive, and less IO-bound, operation than (Thread/yield), such as the constant expression 5:

user=> (doall (for [_ (range 10)] (times 10 5)))
[[390 132 134 132 109 86 94 109 115 112]
 [115 117 114 112 112 89 112 112 115 89]
 [117 106 109 109 109 86 109 109 111 109]
 [121 106 103 103 109 86 106 106 129 109]
 [117 109 106 109 112 95 111 112 109 89]
 [112 112 111 111 114 92 109 112 109 114]
 [118 111 112 111 115 88 112 109 115 92]
 [112 108 108 111 109 92 109 109 118 89]
 [115 106 112 115 112 89 112 109 114 89]
 [117 109 112 112 114 89 114 112 111 91]]

Quite interesting, isn't it? The first expression is always the slowest, or at least very close to the slowest, and bizarrely the sixth and tenth tend to be the fastest. Why should this be?

My best guess is just the mysterious power of HotSpot. There are a number of dynamic-dispatch methods being called even in this very short snippet. You call conj as an IFn, and perhaps HotSpot builds up some confidence that most of your IFn calls will be to conj, and so it tries to make that use case faster; but at the end of each iteration of 10 there are some other functions being called, to append to the larger result list, and so HotSpot backs off its optimizations anticipating you will start doing something else.

Or maybe it's not HotSpot at all, but rather some interaction with the CPU cache, or the operating system's virtual memory manager, or...

Of course this specific scenario is all speculation, but the point is that even when you write very simple code, you rely on a large number of very complicated systems to run it for you, and the end result is basically unknowable without devoting a great deal of study to each of the systems involved.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!