I have a haskell code to resolve a Fast Fourier Transformation, and i want to apply data parallelism on it. However, every strategy that i use generate too many sparks and most
First off: there's a lot of optimisation to be done here before I'd start to think about parallelism:
Lists rock, but their non-consecutive memory model means they just can't allow for traversals nearly as fast as what you can achieve with tight arrays such as Data.Vector
, because you inevitably end up with lots of cache misses. Indeed I've seldom seen a list-based algorithm to gain much from parallelisation, because they're bound by memory- rather than CPU performance.
Your twiddle factors are computed over and over again, you can obviously gain a lot through memoisation here.
You keep on calling length
, but that's an extremely wasteful function (O (n) for something that could be O (1)). Use some container that probably handles length; lists aren't meant to (we like to keep their ability to be infinite).
The parallelisation itself will be pretty simple; I'd check on the length as suggested by John L, indeed I'd rather require a pretty large size before sparking a thread, at least something like 256: as the performance probably becomes crucial only at sizes of several thousands, this should sill be enough threads to keep your cores busy.
import Data.Vector.Unboxed as UBV
import Control.Parallel.Strategies
type ℂ = Complex Float
fft' :: UBV.Vector ℂ -> UBV.Vector ℂ
fft' aₓs = interleave lᵥs rᵥs
where (lᵥs, rᵥs) = (fft lₓs, fft rₓs)
`using` if UBV.length aₓs > 256 then parTuple2 else r0
(lₓs, rₓs) = byflyS aₓs