I implemented a version of this answer https://stackoverflow.com/a/9920425/1261166 (I don\'t know what was intended by the person answering)
sublistofsize 0
Your implementation is the natural "Haskell-ish" one for that problem.
If you end up using the entire result, then there won't be anything asymptotically faster for this problem given the output datastructure ([[a]]
) because it runs in time linear in the length of the output.
The use of map (x:)
is a very natural way to add an element onto the start of each list and there's unlikely to be any significantly faster options given that we are working with lists.
In principle the repeated use of (++)
is inefficient as it causes the left-hand argument to be traversed each time it is called, but the total cost in this case should only be an extra constant factor.
You might be able to improve it with the use of an accumulating parameter otherResults
to collect the results, but to make this change you also need to pass down prefix
in reversed order and re-reverse it at the end, which could well eat up the savings:
sublistofsize' 0 _ prefix otherResults = reverse prefix : otherResults
sublistofsize' _ [] prefix otherResults = otherResults
sublistofsize' n (x : xs) prefix otherResults =
sublistofsize' (n-1) xs (x:prefix) (sublistofsize' n xs prefix otherResults)
sublistofsize n xs = sublistofsize' n xs [] []
An optimization which should help is to keep track of whether there are enough elements in the list to form the rest of the subsequence. This can be done very efficiently by keeping track of a pointer which is n-1
-elements ahead of xs
and advancing them both as you recurse.
An implementation:
nthtail 0 xs = xs
nthtail _ [] = []
nthtail n (x:xs) = nthtail (n-1) xs
subseq 0 _ = [[]]
subseq n xs =
if null t
then []
else go n xs t
where
t = nthtail (n-1) xs -- n should always be >= 1 here
go 0 _ _ = [[]]
go _ _ [] = []
go n xs@(x:xt) t = withx ++ withoutx
where withx = map (x:) $ go (n-1) xt t
withoutx = go n xt (tail t)
I assume that map (x:) gives a problem performance wise
No. map
is coded efficiently and runs in linear time, no problems here.
However, your recursion might be a problem. You're both calling sublistofsize (n-1) xs
and sublistofsize n xs
, which - given a start list sublistofsize m (_:_:ys)
- does evaluate the term sublistofsize (m-1) ys
twice, as there is no sharing between them in the different recursive steps.
So I'd apply dynamic programming to get
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
Not that appending the empty lists is the most beautiful solution, but you can see how I have used zipWith
with the displaced lists so that the results from next
are used twice - once directly in the list of subsequences of length n and once in the list of subsequences of length n+1.
Testing it in GHCI with :set +s
, you can see how this is drastically faster than the naive solutions:
*Main> length $ subsequencesOfSize 7 [1..25]
480700
(0.25 secs, 74132648 bytes)
(0.28 secs, 73524928 bytes)
(0.30 secs, 73529004 bytes)
*Main> length $ sublistofsize 7 [1..25] -- @Vixen (question)
480700
(3.03 secs, 470779436 bytes)
(3.35 secs, 470602932 bytes)
(3.14 secs, 470747656 bytes)
*Main> length $ sublistofsize' 7 [1..25] -- @Ganesh
480700
(2.00 secs, 193610388 bytes)
(2.00 secs, 193681472 bytes)
*Main> length $ subseq 7 [1..25] -- @user5402
480700
(3.07 secs, 485941092 bytes)
(3.07 secs, 486279608 bytes)
This is a 6 years old topic but i believe i have a code worth sharing here.
The accepted answer by @Bergi is just super but still i think this job can be done better as seen from two aspects;
.
combinationsOf :: Int -> [a] -> [[a]]
combinationsOf 1 as = map pure as
combinationsOf k as@(x:xs) = run (l-1) (k-1) as $ combinationsOf (k-1) xs
where
l = length as
run :: Int -> Int -> [a] -> [[a]] -> [[a]]
run n k ys cs | n == k = map (ys ++) cs
| otherwise = map (q:) cs ++ run (n-1) k qs (drop dc cs)
where
(q:qs) = take (n-k+1) ys
dc = product [(n-k+1)..(n-1)] `div` product [1..(k-1)]
Lets compare them against the test case under the accepted answer.
*Main> length $ subsequencesOfSize 7 [1..25]
480700
(0.27 secs, 145,572,672 bytes)
*Main> length $ combinationsOf 7 [1..25]
480700
(0.14 secs, 95,055,360 bytes)
Let us test them against something harder like C(100,5)
*Main> length $ subsequencesOfSize 5 [1..100]
75287520
(52.01 secs, 77,942,823,360 bytes)
*Main> length $ combinationsOf 5 [1..100]
75287520
(17.61 secs, 11,406,834,912 bytes)