subsequences of length n from list performance

前端 未结 4 552
借酒劲吻你
借酒劲吻你 2020-12-05 22:07

I implemented a version of this answer https://stackoverflow.com/a/9920425/1261166 (I don\'t know what was intended by the person answering)

sublistofsize 0          


        
相关标签:
4条回答
  • 2020-12-05 22:30

    Your implementation is the natural "Haskell-ish" one for that problem.

    If you end up using the entire result, then there won't be anything asymptotically faster for this problem given the output datastructure ([[a]]) because it runs in time linear in the length of the output.

    The use of map (x:) is a very natural way to add an element onto the start of each list and there's unlikely to be any significantly faster options given that we are working with lists.

    In principle the repeated use of (++) is inefficient as it causes the left-hand argument to be traversed each time it is called, but the total cost in this case should only be an extra constant factor.

    You might be able to improve it with the use of an accumulating parameter otherResults to collect the results, but to make this change you also need to pass down prefix in reversed order and re-reverse it at the end, which could well eat up the savings:

    sublistofsize' 0 _        prefix otherResults = reverse prefix : otherResults
    sublistofsize' _ []       prefix otherResults = otherResults
    sublistofsize' n (x : xs) prefix otherResults =
       sublistofsize' (n-1) xs (x:prefix) (sublistofsize' n xs prefix otherResults)
    
    sublistofsize n xs = sublistofsize' n xs [] []
    
    0 讨论(0)
  • 2020-12-05 22:30

    An optimization which should help is to keep track of whether there are enough elements in the list to form the rest of the subsequence. This can be done very efficiently by keeping track of a pointer which is n-1-elements ahead of xs and advancing them both as you recurse.

    An implementation:

      nthtail 0 xs = xs
      nthtail _ [] = []
      nthtail n (x:xs) = nthtail (n-1) xs
    
      subseq 0 _ = [[]]
      subseq n xs =
        if null t
          then []
          else go n xs t
        where
          t = nthtail (n-1) xs  -- n should always be >= 1 here
          go 0 _ _  =  [[]]
          go _ _ [] = []
          go n xs@(x:xt) t = withx ++ withoutx
            where withx = map (x:) $ go (n-1) xt t
                  withoutx = go n xt (tail t)
    
    0 讨论(0)
  • 2020-12-05 22:41

    I assume that map (x:) gives a problem performance wise

    No. map is coded efficiently and runs in linear time, no problems here.

    However, your recursion might be a problem. You're both calling sublistofsize (n-1) xs and sublistofsize n xs, which - given a start list sublistofsize m (_:_:ys) - does evaluate the term sublistofsize (m-1) ys twice, as there is no sharing between them in the different recursive steps.

    So I'd apply dynamic programming to get

    subsequencesOfSize :: Int -> [a] -> [[a]]
    subsequencesOfSize n xs = let l = length xs
                              in if n>l then [] else subsequencesBySize xs !! (l-n)
     where
       subsequencesBySize [] = [[[]]]
       subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                                 in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
    

    Not that appending the empty lists is the most beautiful solution, but you can see how I have used zipWith with the displaced lists so that the results from next are used twice - once directly in the list of subsequences of length n and once in the list of subsequences of length n+1.

    Testing it in GHCI with :set +s, you can see how this is drastically faster than the naive solutions:

    *Main> length $ subsequencesOfSize 7 [1..25]
    480700
    (0.25 secs, 74132648 bytes)
    (0.28 secs, 73524928 bytes)
    (0.30 secs, 73529004 bytes)
    *Main> length $ sublistofsize 7 [1..25] -- @Vixen (question)
    480700
    (3.03 secs, 470779436 bytes)
    (3.35 secs, 470602932 bytes)
    (3.14 secs, 470747656 bytes)
    *Main> length $ sublistofsize' 7 [1..25] -- @Ganesh
    480700
    (2.00 secs, 193610388 bytes)
    (2.00 secs, 193681472 bytes)
    *Main> length $ subseq 7 [1..25] -- @user5402
    480700
    (3.07 secs, 485941092 bytes)
    (3.07 secs, 486279608 bytes)
    
    0 讨论(0)
  • 2020-12-05 22:44

    This is a 6 years old topic but i believe i have a code worth sharing here.

    The accepted answer by @Bergi is just super but still i think this job can be done better as seen from two aspects;

    1. Although not mentioned in any of the specifications, it returns combinations in reverse lexicographical order. One might like to have them in lexicographical order as it is mostly the case.
    2. When tested with C(n,n/2) they perform similar however when tested like C(100,5) the following code is much faster and more memory efficient.

    .

    combinationsOf :: Int -> [a] -> [[a]]
    combinationsOf 1 as        = map pure as
    combinationsOf k as@(x:xs) = run (l-1) (k-1) as $ combinationsOf (k-1) xs
                                 where
                                 l = length as
    
                                 run :: Int -> Int -> [a] -> [[a]] -> [[a]]
                                 run n k ys cs | n == k    = map (ys ++) cs
                                               | otherwise = map (q:) cs ++ run (n-1) k qs (drop dc cs)
                                               where
                                               (q:qs) = take (n-k+1) ys
                                               dc     = product [(n-k+1)..(n-1)] `div` product [1..(k-1)]
    

    Lets compare them against the test case under the accepted answer.

    *Main> length $ subsequencesOfSize 7 [1..25]
    480700
    (0.27 secs, 145,572,672 bytes)
    
    *Main> length $ combinationsOf 7 [1..25]
    480700
    (0.14 secs, 95,055,360 bytes)
    

    Let us test them against something harder like C(100,5)

    *Main> length $ subsequencesOfSize 5 [1..100]
    75287520
    (52.01 secs, 77,942,823,360 bytes)
    
    *Main> length $ combinationsOf 5 [1..100]
    75287520
    (17.61 secs, 11,406,834,912 bytes)
    
    0 讨论(0)
提交回复
热议问题