Implementing an efficient sliding-window algorithm in Haskell

问题

I needed an efficient sliding window function in Haskell, so I wrote the following:

windows n xz@(x:xs)
  | length v < n = []
  | otherwise = v : windows n xs
  where
    v = take n xz

My problem with this is that I think the complexity is O(n*m) where m is the length of the list and n is the window size. You count down the list once for take, another time for length, and you do it down the list of essentially m-n times. It seems like it can be more efficient than this, but I'm at a loss for how to make it more linear. Any takers?

回答1:

You can use Seq from Data.Sequence, which has O(1) enqueue and dequeue at both ends:

import Data.Foldable (toList)
import qualified Data.Sequence as Seq
import Data.Sequence ((|>))

windows :: Int -> [a] -> [[a]]
windows n0 = go 0 Seq.empty
  where
    go n s (a:as) | n' <  n0   =              go n' s'  as
                  | n' == n0   = toList s'  : go n' s'  as
                  | otherwise =  toList s'' : go n  s'' as
      where
        n'  = n + 1         -- O(1)
        s'  = s |> a        -- O(1)
        s'' = Seq.drop 1 s' -- O(1)
    go _ _ [] = []

Note that if you materialize the entire result your algorithm is necessarily O(N*M) since that is the size of your result. Using Seq just improves performance by a constant factor.

Example use:

>>> windows [1..5]
[[1,2,3],[2,3,4],[3,4,5]]

回答2:

You can't get better than O(m*n), since this is the size of the output data structure.

But you can avoid checking the lengths of the windows if you reverse the order of operations: First create n shifted lists and then just zip them together. Zipping will get rid of those that don't have enough elements automatically.

import Control.Applicative
import Data.Traversable (sequenceA)
import Data.List (tails)

transpose' :: [[a]] -> [[a]]
transpose' = getZipList . sequenceA . map ZipList

Zipping a list of lists is just a transposition, but unlike transpose from Data.List it throws away outputs that would have less than n elements.

Now it's easy to make the window function: Take m lists, each shifted by 1, and just zip them:

windows :: Int -> [a] -> [[a]]
windows m = transpose' . take m . tails

Works also for infinite lists.

回答3:

First let's get the windows without worrying about the short ones at the end:

import Data.List (tails)

windows' :: Int -> [a] -> [[a]]
windows' n = map (take n) . tails

> windows' 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5],[4,5],[5],[]]

Now we want to get rid of the short ones without checking the length of every one.

Since we know they are at the end, we could lose them like this:

windows n xs = take (length xs - n + 1) (windows' n xs)

But that's not great since we still go through xs an extra time to get its length. It also doesn't work on infinite lists, which your original solution did.

Instead let's write a function for using one list as a ruler to measure the amount to take from another:

takeLengthOf :: [a] -> [b] -> [b]
takeLengthOf = zipWith (flip const)

> takeLengthOf ["elements", "get", "ignored"] [1..10]
[1,2,3]

Now we can write this:

windows :: Int -> [a] -> [[a]]
windows n xs = takeLengthOf (drop (n-1) xs) (windows' n xs)

> windows 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5]]

Works on infinite lists too:

> take 5 (windows 3 [1..])
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]

As Gabriel Gonzalez says, the time complexity is no better if you want to use the whole result. But if you only use some of the windows, we now manage to avoid doing the work of take and length on the ones you don't use.

回答4:

If you want O(1) length then why not use a structure that provides O(1) length? Assuming you aren't looking for windows from an infinite list, consider using:

import qualified Data.Vector as V
import Data.Vector (Vector)
import Data.List(unfoldr) 

windows :: Int -> [a] -> [[a]]
windows n = map V.toList . unfoldr go . V.fromList
 where                    
  go xs | V.length xs < n = Nothing
        | otherwise =
            let (a,b) = V.splitAt n xs
            in Just (a,b)

Conversation of each window from a vector to a list might bite you some, I won't hazard an optimistic guess there, but I will bet that the performance is better than the list-only version.

回答5:

For the sliding window I also used unboxed Vetors as length, take, drop as well as splitAt are O(1) operations.

The code from Thomas M. DuBuisson is a by n shifted window, not a sliding, except if n =1. Therefore a (++) is missing, however this has a cost of O(n+m). Therefore careful, where you put it.

 import qualified Data.Vector.Unboxed as V
 import Data.Vector.Unboxed (Vector)
 import Data.List

 windows :: Int -> Vector Double -> [[Int]]
 windows n = (unfoldr go) 
  where                    
   go !xs | V.length xs < n = Nothing
          | otherwise =
             let (a,b) = V.splitAt 1 xs
                  c= (V.toList a ++V.toList (V.take (n-1) b))
             in (c,b)

I tried it out with +RTS -sstderr and:

 putStrLn $ show (L.sum $ L.concat $  windows 10 (U.fromList $ [1..1000000]))

and got real time 1.051s and 96.9% usage, keeping in mind that after the sliding window two O(m) operations are performed.

来源：https://stackoverflow.com/questions/27726739/implementing-an-efficient-sliding-window-algorithm-in-haskell

标签

haskell

time-complexity

sliding-window