问题
I'm writing a program that reads multiple URLs (one per line) from standard input, slightly adapts them and makes HTTP requests for each of those multiple URLs in parallel. Responses are printed to standard output. Here's the code:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Monad
import Network.Wreq
import Control.Concurrent.MSem
import Control.Concurrent.Async
import Control.Concurrent (threadDelay)
import qualified Data.Traversable as T
main :: IO ()
main = void $ mapPool 4 (const processUrl) [1..]
mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
mapPool max f xs = do semaphore <- new max
mapConcurrently (with semaphore . f) xs
processUrl :: IO ()
processUrl = do param <- getLine
response <- get (url ++ param)
print response
url = "http://example.com/resources?param="
Parallelism is hard-coded here to four. The problem arises when some of the IO actions (HTTP requests) in a batch fail. As per Control.Concurrent.Async.mapConcurrently
's design, if one action fails, the rest are canceled. In my case it seems that the last batch will always fail because input hits EOF, an exception happens, and the program outputs:
my-program-exe: <stdin>: hGetLine: end of file
Is there an alternative for mapConcurrently that does not cancel all other action in case one ends with an exception? If not, is there a better way to approach this type of task?
回答1:
Is there an alternative for mapConcurrently that does not cancel all other action in case one ends with an exception?
Here the exception is quite predictable, so perhaps we should handle the problem at the source, for example checking for EOF before reading each line. We could put that in a IO (Maybe String)
action that used Nothing
to signify EOF.
getLineMaybe :: IO (Maybe String)
getLineMaybe =
do isEOF <- hIsEOF stdin
if isEOF then return Nothing
else Just <$> System.IO.getLine
There's a problem with your example: writing to standard output concurrently is likely to produce a garbled result. The process of writing to stdout should be done from one thread only, and possibly reading from stdin as well.
Perhaps we could have two (closeable and bounded) concurrent queues, one in which we put lines read from stdin, and another in which we put processed results to be written later. Connecting one to the other there would be a number of worker threads.
Using the packages async, stm and stm-chans
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Concurrent.STM.TVar
import qualified Control.Concurrent.STM.TBMQueue as Q -- closeable, bounded queue
and this helper funcion
untilNothing :: IO (Maybe a) -> (a -> IO ()) -> IO () -> IO ()
untilNothing action handler finalizer =
let go = do mx <- action
case mx of
Nothing -> finalizer
Just x -> do handler x
go
in go
we can write a general function like the following
data ConcConf = ConcConf {
pendingQueueSize :: Int,
doneQueueSize :: Int,
concurrencyLevel :: Int
} deriving Show
concPipeline :: ConcConf -> IO (Maybe a) -> (a -> IO b) -> (b -> IO ()) -> IO ()
concPipeline conf reader transformer writer =
do src <- atomically $ Q.newTBMQueue (pendingQueueSize conf)
dst <- atomically $ Q.newTBMQueue (doneQueueSize conf)
workersLeft <- atomically $ newTVar (concurrencyLevel conf)
let gang = replicateConcurrently_ (concurrencyLevel conf)
pipeline =
untilNothing reader
(\a -> atomically $ Q.writeTBMQueue src a)
(atomically $ Q.closeTBMQueue src)
`concurrently_`
untilNothing (atomically $ Q.readTBMQueue dst)
writer
(pure ())
`concurrently_`
-- worker threads connecting reader and writer
gang (untilNothing (atomically $ Q.readTBMQueue src)
(\a -> do b <- transformer a
atomically $ Q.writeTBMQueue dst b)
-- last one remaining closes shop
(atomically $ do modifyTVar' workersLeft pred
c <- readTVar workersLeft
if c == 0 then Q.closeTBMQueue dst
else pure ()))
pipeline
来源:https://stackoverflow.com/questions/50306343/using-mapconcurrently-to-read-stdin-make-http-calls-and-write-to-stdout-in-para