Implementing an efficient sliding-window algorithm in Haskell

泪湿孤枕 提交于 2019-11-29 02:01:51

You can use Seq from Data.Sequence, which has O(1) enqueue and dequeue at both ends:

import Data.Foldable (toList)
import qualified Data.Sequence as Seq
import Data.Sequence ((|>))

windows :: Int -> [a] -> [[a]]
windows n0 = go 0 Seq.empty
  where
    go n s (a:as) | n' <  n0   =              go n' s'  as
                  | n' == n0   = toList s'  : go n' s'  as
                  | otherwise =  toList s'' : go n  s'' as
      where
        n'  = n + 1         -- O(1)
        s'  = s |> a        -- O(1)
        s'' = Seq.drop 1 s' -- O(1)
    go _ _ [] = []

Note that if you materialize the entire result your algorithm is necessarily O(N*M) since that is the size of your result. Using Seq just improves performance by a constant factor.

Example use:

>>> windows [1..5]
[[1,2,3],[2,3,4],[3,4,5]]

You can't get better than O(m*n), since this is the size of the output data structure.

But you can avoid checking the lengths of the windows if you reverse the order of operations: First create n shifted lists and then just zip them together. Zipping will get rid of those that don't have enough elements automatically.

import Control.Applicative
import Data.Traversable (sequenceA)
import Data.List (tails)

transpose' :: [[a]] -> [[a]]
transpose' = getZipList . sequenceA . map ZipList

Zipping a list of lists is just a transposition, but unlike transpose from Data.List it throws away outputs that would have less than n elements.

Now it's easy to make the window function: Take m lists, each shifted by 1, and just zip them:

windows :: Int -> [a] -> [[a]]
windows m = transpose' . take m . tails

Works also for infinite lists.

First let's get the windows without worrying about the short ones at the end:

import Data.List (tails)

windows' :: Int -> [a] -> [[a]]
windows' n = map (take n) . tails

> windows' 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5],[4,5],[5],[]]

Now we want to get rid of the short ones without checking the length of every one.

Since we know they are at the end, we could lose them like this:

windows n xs = take (length xs - n + 1) (windows' n xs)

But that's not great since we still go through xs an extra time to get its length. It also doesn't work on infinite lists, which your original solution did.

Instead let's write a function for using one list as a ruler to measure the amount to take from another:

takeLengthOf :: [a] -> [b] -> [b]
takeLengthOf = zipWith (flip const)

> takeLengthOf ["elements", "get", "ignored"] [1..10]
[1,2,3]

Now we can write this:

windows :: Int -> [a] -> [[a]]
windows n xs = takeLengthOf (drop (n-1) xs) (windows' n xs)

> windows 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5]]

Works on infinite lists too:

> take 5 (windows 3 [1..])
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]

As Gabriel Gonzalez says, the time complexity is no better if you want to use the whole result. But if you only use some of the windows, we now manage to avoid doing the work of take and length on the ones you don't use.

If you want O(1) length then why not use a structure that provides O(1) length? Assuming you aren't looking for windows from an infinite list, consider using:

import qualified Data.Vector as V
import Data.Vector (Vector)
import Data.List(unfoldr) 

windows :: Int -> [a] -> [[a]]
windows n = map V.toList . unfoldr go . V.fromList
 where                    
  go xs | V.length xs < n = Nothing
        | otherwise =
            let (a,b) = V.splitAt n xs
            in Just (a,b)

Conversation of each window from a vector to a list might bite you some, I won't hazard an optimistic guess there, but I will bet that the performance is better than the list-only version.

For the sliding window I also used unboxed Vetors as length, take, drop as well as splitAt are O(1) operations.

The code from Thomas M. DuBuisson is a by n shifted window, not a sliding, except if n =1. Therefore a (++) is missing, however this has a cost of O(n+m). Therefore careful, where you put it.

 import qualified Data.Vector.Unboxed as V
 import Data.Vector.Unboxed (Vector)
 import Data.List

 windows :: Int -> Vector Double -> [[Int]]
 windows n = (unfoldr go) 
  where                    
   go !xs | V.length xs < n = Nothing
          | otherwise =
             let (a,b) = V.splitAt 1 xs
                  c= (V.toList a ++V.toList (V.take (n-1) b))
             in (c,b)

I tried it out with +RTS -sstderr and:

 putStrLn $ show (L.sum $ L.concat $  windows 10 (U.fromList $ [1..1000000]))

and got real time 1.051s and 96.9% usage, keeping in mind that after the sliding window two O(m) operations are performed.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!