Can Haskell's Control.Concurrent.Async.mapConcurrently have a limit?

后端 未结 5 506
有刺的猬
有刺的猬 2021-02-04 08:45

I\'m attempting to run multiple downloads in parallel in Haskell, which I would normally just use the Control.Concurrent.Async.mapConcurrently function for. However, doing so o

相关标签:
5条回答
  • 2021-02-04 08:55

    A quick solution would be to use a semaphore to restrict the number of concurrent actions. It's not optimal (all threads are created at once and then wait), but works:

    import Control.Concurrent.MSem
    import Control.Concurrent.Async
    import Control.Concurrent (threadDelay)
    import qualified Data.Traversable as T
    
    mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
    mapPool max f xs = do
        sem <- new max
        mapConcurrently (with sem . f) xs
    
    -- A little test:
    main = mapPool 10 (\x -> threadDelay 1000000 >> print x) [1..100]
    
    0 讨论(0)
  • 2021-02-04 09:01

    This is really easy to do using the Control.Concurrent.Spawn library:

    import Control.Concurrent.Spawn
    
    type URL      = String
    type Response = String    
    
    numMaxConcurrentThreads = 4
    
    getURLs :: [URL] -> IO [Response]
    getURLs urlList = do
       wrap <- pool numMaxConcurrentThreads
       parMapIO (wrap . fetchURL) urlList
    
    fetchURL :: URL -> IO Response
    
    0 讨论(0)
  • 2021-02-04 09:02

    If you have actions in a list, this one has less dependencies

    import Control.Concurrent.Async (mapConcurrently)
    import Data.List.Split (chunksOf)
    
    mapConcurrentChunks :: Int -> (a -> IO b) -> [a] -> IO [b]
    mapConcurrentChunks n ioa xs = concat <$> mapM (mapConcurrently ioa) (chunksOf n xs)
    

    Edit: Just shortened a bit

    0 讨论(0)
  • 2021-02-04 09:04

    Chunking the threads may be inefficient if a few of them last significantly longer than the others. Here is a smoother, yet more complex, solution:

    {-# LANGUAGE TupleSections #-}
    import Control.Concurrent.Async (async, waitAny)
    import Data.List                (delete, sortBy)
    import Data.Ord                 (comparing)
    
    concurrentlyLimited :: Int -> [IO a] -> IO [a]
    concurrentlyLimited n tasks = concurrentlyLimited' n (zip [0..] tasks) [] []
    
    concurrentlyLimited' _ [] [] results = return . map snd $ sortBy (comparing fst) results
    concurrentlyLimited' 0 todo ongoing results = do
        (task, newResult) <- waitAny ongoing
        concurrentlyLimited' 1 todo (delete task ongoing) (newResult:results)
    concurrentlyLimited' n [] ongoing results = concurrentlyLimited' 0 [] ongoing results
    concurrentlyLimited' n ((i, task):otherTasks) ongoing results = do
        t <- async $ (i,) <$> task
        concurrentlyLimited' (n-1) otherTasks (t:ongoing) results
    

    Note : the above code could be made more generic using an instance of MonadBaseControl IO in place of IO, thanks to lifted-async.

    0 讨论(0)
  • 2021-02-04 09:07

    You may also try the pooled-io package where you can write:

    import qualified Control.Concurrent.PooledIO.Final as Pool
    import Control.DeepSeq (NFData)
    import Data.Traversable (Traversable, traverse)
    
    mapPool ::
       (Traversable t, NFData b) =>
       Int -> (a -> IO b) -> t a -> IO (t b)
    mapPool n f = Pool.runLimited n . traverse (Pool.fork . f)
    
    0 讨论(0)
提交回复
热议问题