Algorithms for modern hardware?

后端未结

关注

 5  695

Once again, I find myself with a set of broken assumptions. The article itself is about a 10x performance gain by modifying a proven-optimal algorithm to account for virtual mem

相关标签:

5条回答

灰色年华

2021-01-31 22:09

I'll expand on GregS's answer: the difference is between effective complexity and asymptotic complexity. Asymptotic complexity ignores constant factors and is valid only for “large enough” inputs. Oftentimes “large enough” can actually mean “larger than any computer can deal with, now and for a few decades”; this is where the theory (justifiably) gets a bad reputation. Of course there are also cases where “large enough” means n=3 !

A more complex (and thus more accurate) way of looking at this is to first ask “what is the size range of problems you are interested in?” Then you need to measure the efficiency of various algorithms in that size range, to get a feeling for the ‘hidden constants’. Or you can use finer methods of algorithmic asymptotics which actually give you estimates on the constants.

The other thing to look at are ‘transition points’. Of course an algorithm which runs in 2n² time will be faster than one that runs in 10¹⁶nlog(n) times for all n < 1.99 * 10¹⁷. So the quadratic algorithm will be the one to choose (unless you are dealing with the sizes of data that CERN worries about). Even sub-exponential terms can bite – 3n³ is a lot better than n³ + 10¹⁶n² for n < 5*10¹⁵ (assuming that these are actual complexities).

0 讨论(0)
发布评论:

提交评论
- 加载中...
刺人心

2021-01-31 22:11

O(n) is only part of the story -- a big part, and frequently the dominant part, but not always the dominant part. Once you get to performance optimization (which should not be done too early in your development), you need to consider all of the resources you are using. You can generalize Amdahl's Law to mean that your execution time will be dominated by the most limited resource. Note that also means that the particular hardware on which you're executing must also be considered. A program that is highly optimized and extremely efficient for a massively parallel computer (e.g., CM or MassPar) would probably not do well on a big vector box (e.g., Cray-2) nor on a high-speed microprocessor. That program might not even do well on a massive array of capable microprocessors (map/reduce style). The different optimizations for the different balances of cache, CPU communications, I/O, CPU speed, memory access, etc. mean different performance.

Back when I spent time working on performance optimizations, we would strive for "balanced" performance over the whole system. A super-fast CPU with a slow I/O system rarely made sense, and so on. O() typically considers only CPU complexity. You may be able to trade off memory space (unrolling loops doesn't make O() sense, but it does frequently help real performance); concern about cache hits vice linear memory layouts vice memory bank hits; virtual vs real memory; tape vs rotating disk vs RAID, and so on. If your performance is dominated by CPU activity, with I/O and memory loafing along, big-O is your primary concern. If your CPU is at 5% and the network is at 100%, maybe you can get away from big-O and work on I/O, caching, etc.

Multi-threading, particularly with multi-cores, makes all of that analysis even more complex. This opens up into very extensive discussion. If you are interested, Google can give you months or years of references.

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2021-01-31 22:12

One important thing is to realize that the most common usage of big-O notation (to talk about runtime complexity) is only half of the story - there's another half, namely space complexity (that can also be expressed using big-O) which can also be quite relevant.

Generally these days, memory capacity advances have outpaced computing speed advances (for a single core - parallelization can get around this), so less focus is given to space complexity, but it's still a factor that should be kept in mind, especially on machines with more limited memory or if working with very large amounts of data.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2021-01-31 22:12

There are no broken assumptions that I see. Big-O notation is a measure of algorithmic complexity on a very, very, simplified idealized computing machine, and ignoring constant terms. Obviously it is not the final word on actual speeds on actual machines.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-01-31 22:24
You only need to re-examine your algorithms when your customers complain about the slowness of your program or it is missing critical deadlines. Otherwise focus on correctness, robustness, readability, and ease of maintenance. Until these items are achieved any performance optimization is a waste of development time.

Page faults and disk operations may be platform specific. Always profile your code to see where the bottlenecks are. Spending time on these areas will produce the most benefits.

If you're interested, along with page faults and slow disk operations, you may want to aware of:
- Cache hits -- Data Oriented Design
- Cache hits -- Reducing unnecessary branches / jumps.
- Cache prediction -- Shrink loops so they fit into the processor's cache.
Again, these items are only after quality has been achieved, customer complaints and a profiler has analyzed your program.
0 讨论(0)
发布评论:

提交评论
- 加载中...