How do I implement `cat` in Haskell?

后端 未结 4 735
悲哀的现实
悲哀的现实 2021-01-11 12:24

I am trying to write a simple cat program in Haskell. I would like to take multiple filenames as arguments, and write each file sequentially to STDOUT

相关标签:
4条回答
  • 2021-01-11 12:33

    Your problem is that exitWith terminates the whole program. So, you cannot really use forever to loop through the file, because obviously you don't want to run the function "forever", just until the end of the file. You can rewrite catHandle like this

    catHandle :: Handle -> IO ()
    catHandle h = do
        eof <- IO.hIsEOF h
        if eof then do
            hClose h
         else
            hGetLine h >>= putStrLn
            catHandle h
    

    I.e. if we haven't reached EOF, we recurse and read another line.

    However, this whole approach is overly complicated. You can write cat simply as

    main = do
        files <- getArgs
        forM_ files $ \filename -> do
            contents <- readFile filename
            putStr contents
    

    Because of lazy i/o, the whole file contents are not actually loaded into memory, but streamed into stdout.

    If you are comfortable with the operators from Control.Monad, the whole program can be shortened down to

    main = getArgs >>= mapM_ (readFile >=> putStr)
    
    0 讨论(0)
  • 2021-01-11 12:37

    If you install the very helpful conduit package, you can do it this way:

    module Main where
    
    import Control.Monad
    import Data.Conduit
    import Data.Conduit.Binary
    import System.Environment
    import System.IO
    
    main :: IO ()
    main = do files <- getArgs
              forM_ files $ \filename -> do
                runResourceT $ sourceFile filename $$ sinkHandle stdout
    

    This looks similar to shang's suggested simple solution, but using conduits and ByteString instead of lazy I/O and String. Both of those are good things to learn to avoid: lazy I/O frees resources at unpredictable times; String has a lot of memory overhead.

    Note that ByteString is intended to represent binary data, not text. In this case we're just treating the files as uninterpreted sequences of bytes, so ByteString is fine to use. If OTOH we were processing the file as text—counting characters, parsing, etc—we'd want to use Data.Text.

    EDIT: You can also write it like this:

    main :: IO ()
    main = getArgs >>= catFiles
    
    type Filename = String
    
    catFiles :: [Filename] -> IO ()
    catFiles files = runResourceT $ mapM_ sourceFile files $$ sinkHandle stdout
    

    In the original, sourceFile filename creates a Source that reads from the named file; and we use forM_ on the outside to loop over each argument and run the ResourceT computation over each filename.

    However in Conduit you can use monadic >> to concatenate sources; source1 >> source2 is a source that produces the elements of source1 until it's done, then produces those of source2. So in this second example, mapM_ sourceFile files is equivalent to sourceFile file0 >> ... >> sourceFile filen—a Source that concatenates all of the sources.

    EDIT 2: And following Dan Burton's suggestion in the comment to this answer:

    module Main where
    
    import Control.Monad
    import Control.Monad.IO.Class
    import Data.ByteString
    import Data.Conduit
    import Data.Conduit.Binary
    import System.Environment
    import System.IO
    
    main :: IO ()
    main = runResourceT $ sourceArgs $= readFileConduit $$ sinkHandle stdout
    
    -- | A Source that generates the result of getArgs.
    sourceArgs :: MonadIO m => Source m String
    sourceArgs = do args <- liftIO getArgs
                    forM_ args yield
    
    type Filename = String          
    
    -- | A Conduit that takes filenames as input and produces the concatenated 
    -- file contents as output.
    readFileConduit :: MonadResource m => Conduit Filename m ByteString
    readFileConduit = awaitForever sourceFile
    

    In English, sourceArgs $= readFileConduit is a source that produces the contents of the files named by the command line arguments.

    0 讨论(0)
  • 2021-01-11 12:48

    catHandle, which is indirectly called from catFileArray, calls exitWith when it reaches the end of the first file. This terminates the program, and further files aren't read anymore.

    You should instead just return normally from the catHandle function when the end of the file has been reached. This probably means you shouldn't do the reading forever.

    0 讨论(0)
  • 2021-01-11 12:49

    My first idea is this:

    import System.Environment
    import System.IO
    import Control.Monad
    main = getArgs >>= mapM_ (\name -> readFile name >>= putStr)
    

    It doesn't really fail in unix-y way, and doesn't do stdin nor multibyte stuff, but it is "way more haskell" so I just wanted to share that. Hope it helps.

    On the other hand, I guess it should handle large files easily without filling up memory, thanks to the fact that putStr can already empty the string during file reading.

    0 讨论(0)
提交回复
热议问题