Haskell: Scan Through a List and Apply A Different Function for Each Element

前端未结

关注

 4  1950

I need to scan through a document and accumulate the output of different functions for each string in the file. The function run on any given line of the file depends on what i

相关标签:

4条回答

广开言路

2021-02-06 11:56
I show a solution for two types of line, but it is easily extended to five types of line by using a five-tuple instead of a two-tuple.
```
import Data.Monoid

eachLine :: B.ByteString -> ([Atom], [Sheet])
eachLine bs | isAnAtom bs = ([ {- calculate an Atom -} ], [])
            | isASheet bs = ([], [ {- calculate a Sheet -} ])
            | otherwise = error "eachLine"

allLines :: [B.ByteString] -> ([Atom], [Sheet])
allLines bss = mconcat (map eachLine bss)
```
The magic is done by mconcat from Data.Monoid (included with GHC).

(On a point of style: personally I would define a Line type, a parseLine :: B.ByteString -> Line function and write eachLine bs = case parseLine bs of .... But this is peripheral to your question.)
0 讨论(0)
发布评论:

提交评论
- 加载中...

慢半拍i

2021-02-06 12:01

It is a good idea to introduce a new ADT, e.g. "Summary" instead of tuples. Then, since you want to accumulate the values of Summary you came make it an istance of Data.Monoid. Then you classify each of your lines with the help of classifier functions (e.g. isAtom, isSheet, etc.) and concatenate them together using Monoid's mconcat function (as suggested by @dave4420).

Here is the code (it uses String instead of ByteString, but it is quite easy to change):

module Classifier where

import Data.List
import Data.Monoid

data Summary = Summary
  { atoms :: [String]
  , sheets :: [String]
  , digits :: [String]
  } deriving (Show)

instance Monoid Summary where
  mempty = Summary [] [] []
  Summary as1 ss1 ds1 `mappend` Summary as2 ss2 ds2 =
    Summary (as1 `mappend` as2)
            (ss1 `mappend` ss2)
            (ds1 `mappend` ds2)

classify :: [String] -> Summary
classify = mconcat  . map classifyLine

classifyLine :: String -> Summary
classifyLine line
  | isAtom line  = Summary [line] [] [] -- or "mempty { atoms = [line] }"
  | isSheet line = Summary [] [line] []
  | isDigit line = Summary [] [] [line]
  | otherwise    = mempty -- or "error" if you need this  

isAtom, isSheet, isDigit :: String -> Bool
isAtom = isPrefixOf "atom"
isSheet = isPrefixOf "sheet"
isDigit = isPrefixOf "digits"

input :: [String]
input = ["atom1", "sheet1", "sheet2", "digits1"]

test :: Summary
test = classify input

0 讨论(0)

无人共我

2021-02-06 12:06

If you have only 2 alternatives, using Either might be a good idea. In that case combine your functions, map the list, and use lefts and rights to get the results:

import Data.Either

-- first sample function, returning String
f1 x = show $ x `div` 2

-- second sample function, returning Int
f2 x = 3*x+1

-- combined function returning Either String Int
hotpo x = if even x then Left (f1 x) else Right (f2 x)

xs = map hotpo [1..10] 
-- [Right 4,Left "1",Right 10,Left "2",Right 16,Left "3",Right 22,Left "4",Right 28,Left "5"]

lefts xs 
-- ["1","2","3","4","5"]

rights xs
-- [4,10,16,22,28]

0 讨论(0)

甜味超标

2021-02-06 12:17
First of all, I think that the answers others have supplied will work at least 95% of the time. It's always good practice to code for the problem at hand by using appropriate data types (or tuples in some cases). However, sometimes you really don't know in advance what you're looking for in the list, and in these cases trying to enumerate all possibilities is difficult/time-consuming/error-prone. Or, you're writing multiple variants of the same sort of thing (manually inlining multiple folds into one) and you'd like to capture the abstraction.

Fortunately, there are a few techniques that can help.

The framework solution

(somewhat self-evangelizing)

First, the various "iteratee/enumerator" packages often provide functions to deal with this sort of problem. I'm most familiar with iteratee, which would let you do the following:
```
import Data.Iteratee as I
import Data.Iteratee.Char
import Data.Maybe

-- first, you'll need some way to process the Atoms/Sheets/etc. you're getting
-- if you want to just return them as a list, you can use the built-in
-- stream2list function

-- next, create stream transformers
-- given at :: B.ByteString -> Maybe Atom
-- create a stream transformer from ByteString lines to Atoms
atIter :: Enumeratee [B.ByteString] [Atom] m a
atIter = I.mapChunks (catMaybes . map at)

otIter :: Enumeratee [B.ByteString] [Sheet] m a
otIter = I.mapChunks (catMaybes . map ot)

-- finally, combine multiple processors into one
-- if you have more than one processor, you can use zip3, zip4, etc.
procFile :: Iteratee [B.ByteString] m ([Atom],[Sheet])
procFile = I.zip (atIter =$ stream2list) (otIter =$ stream2list)

-- and run it on some data
runner :: FilePath -> IO ([Atom],[Sheet])
runner filename = do
  resultIter <- enumFile defaultBufSize filename $= enumLinesBS $ procFile
  run resultIter
```
One benefit this gives you is extra composability. You can create transformers as you like, and just combine them with zip. You can even run the consumers in parallel if you like (although only if you're working in the IO monad, and probably not worth it unless the consumers do a lot of work) by changing to this:
```
import Data.Iteratee.Parallel

parProcFile = I.zip (parI $ atIter =$ stream2list) (parI $ otIter =$ stream2list)
```
The result of doing so isn't the same as a single for-loop - this will still perform multiple traversals of the data. However, the traversal pattern has changed. This will load a certain amount of data at once (defaultBufSize bytes) and traverse that chunk multiple times, storing partial results as necessary. After a chunk has been entirely consumed, the next chunk is loaded and the old one can be garbage collected.

Hopefully this will demonstrate the difference:
```
Data.List.zip:
  x1 x2 x3 .. x_n
                   x1 x2 x3 .. x_n

Data.Iteratee.zip:
  x1 x2      x3 x4      x_n-1 x_n
       x1 x2      x3 x4           x_n-1 x_n
```
If you're doing enough work that parallelism makes sense this isn't a problem at all. Due to memory locality, the performance is much better than multiple traversals over the entire input as Data.List.zip would make.

The beautiful solution

If a single-traversal solution really does make the most sense, you might be interested in Max Rabkin's Beautiful Folding post, and Conal Elliott's followup work (this too). The essential idea is that you can create data structures to represent folds and zips, and combining these lets you create a new, combined fold/zip function that only needs one traversal. It's maybe a little advanced for a Haskell beginner, but since you're thinking about the problem you may find it interesting or useful. Max's post is probably the best starting point.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Haskell: Scan Through a List and Apply A Different Function for Each Element

The framework solution

The beautiful solution