Safe and fast FFT

前端 未结 1 370
天涯浪人
天涯浪人 2021-02-02 00:20

Inspired by Herb Sutter\'s compelling lecture Not your father\'s C++, I decided to take another look at the latest version of C++ using Microsoft\'s Visual Studio 2010. I was pa

1条回答
  •  灰色年华
    2021-02-02 00:52

    I think you are being overly "safe" in your use of at(). In most of your cases the index used is trivially verifable as being constrained by the size of the container in the for loop.

    e.g.

      for(unsigned int i=0; i

    The only ones I'd leave as at()s are the (i + j)s. It's not immediately obvious whether they would always be constrained (although if I was really unsure I'd probably manually check - but I'm not familiar with FFTs enough to have an opinion in this case).

    There are also some fixed computations being repeated for each loop iteration:

    int m=zs.size()/2;
    pi * sign
    2*j
    

    And the zs.at(i + j) is computed twice.

    It's possible that the optimiser may catch these - but if you are treating this as performance critical, and you have your timers testing it, I'd hoist them out of the loops (or, in the case of zs.at(i + j), just take a reference on first use) and see if that impacts the timer.

    Talking of second-guessing the optimiser: I'm sure that the calls to .size() will be inlined as, at least, a direct call to an internal member variable - but given how many times you call it I'd also experiment with introducing local variables for zs.size() and zs.size()-1 upfront. They're more likely to be put into registers that way too.

    I don't know how much of a difference (if any) all of this will have on your total runtime - some of it may already be caught by the optimiser, and the differences may be small compared to the computations involved - but worth a shot.

    As for being idiomatic my only comment, really, is that size() returns a std::size_t (which is usually a typedef for an unsigned int - but it's more idiomatic to use that type instead). If you did want to use auto but avoid the warning you could try adding the ul suffix to the 0 - not sure I'd say that is idiomatic, though. I suppose you're already less than idiomatic in not using iterators here, but I can see why you can't do that (easily).

    Update

    I gave all my suggestions a try and they all had a measurable performance improvement - except the i+j and 2*j precalcs - they actually caused a slight slowdown! I presume they either prevented a compiler optimisation or prevented it from using registers for some things.

    Overall I got a >10% perf. improvement with those suggestions. I suspect more could be had if the second block of loops was refactored a little to avoid the jumps - and having done so enabling SSE2 instruction set may give a significant boost (I did try it as is and saw a slight slowdown).

    I think that refactoring, along with using something like MKL for the cos and sin calls should give greater, and less brittle, improvements. And neither of those things would be language dependent (I know this was originally being compared to an F# implementation).

    Update 2

    I forgot to mention that pre-calculating zs.size() did make a difference.

    Update 3

    Also forgot to say (until reminded by @xeo in comment to OP) that the block following the i < j check can be boiled down to a std::swap. This is more idiomatic and at least as performant - in the worst case should inline to the same code as written. Indeed when I did it I saw no change in the performance. In other cases it can lead to a performance gain if move constructors are available.

    0 讨论(0)
提交回复
热议问题