Haskell - Having trouble understanding a small bit of code

帅比萌擦擦* 提交于 2020-01-13 19:54:26

问题


I am doing a school task where I am given a small bit of sample code which I can use later. I understand 90% of this code but there is one little line/function that I for the life of me can't figure out what it does (I am very new to Haskell btw).

Sample code:

data Profile = Profile {matrix::[[(Char,Int)]], moleType::SeqType, nrOfSeqs::Int, nm::String} deriving (Show)

nucleotides = "ACGT"
aminoacids = sort "ARNDCEQGHILKMFPSTWYVX"

makeProfileMatrix :: [MolSeq] -> [[(Char, Int)]]
makeProfileMatrix [] = error "Empty sequence list"
makeProfileMatrix sl = res
  where 
    t = seqType (head sl)
    defaults = 
      if (t == DNA) then
        zip nucleotides (replicate (length nucleotides) 0) -- Row 1
      else 
        zip aminoacids (replicate (length aminoacids) 0)   -- Row 2
    strs = map seqSequence sl                              -- Row 3
    tmp1 = map (map (\x -> ((head x), (length x))) . group . sort)
               (transpose strs)                            -- Row 4
    equalFst a b = (fst a) == (fst b)
    res = map sort (map (\l -> unionBy equalFst l defaults) tmp1)

{-Row 1: 'replicate' creates a list of zeros that is equal to the length of the 'nucleotides' string. 
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the nucleotides-}

{-Row 2: 'replicate' creates a list of zeros that is equal to the length of the 'aminoacids' string.
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the aminoacids-}

{-Row 3: The function 'seqSequence' is applied to each element in the 'sl' list and then returns a new altered list. 
In other words 'strs' becomes a list that contains the all the sequences in 'sl' (sl contains MolSeq objects, not strings)-}

{-Row 4: (transpose strs) creates a list that has each 'column' of sequences as a element (the first element is made up of each first element in each sequence etc.).
--}

I have written an explanation for each marked Row in the code (which I think so far is correct) but I get stuck when I try to figure out what Row 4 does. I understand the 'transpose' bit but I can't at all figure out what the inner map function does. As far as I know a 'map' function needs a list as a second parameter to function but the inner map function only has an anonymous function but no list to operate on. To be perfectly clear I don't understand what the entire inner line map (\x -> ((head x), (length x))) . group . sort does. Please help!

Bonus!:

Here is another piece of sample code that I can't figure out (never worked with classes in Haskell):

class Evol object where
 name :: object -> String
 distance :: object -> object -> Double
 distanceMatrix :: [object] -> [(String, String, Double)]
 addRow :: [object] -> Int -> [(String, String, Double)]
 distanceMatrix [] = []
 distanceMatrix object =
  addRow object 0 ++ distanceMatrix (tail object)
 addRow object num  -- Adds row to distance matrix
  | num < length object = (name a, name b, distance a b) : addRow object (num + 1)
  | otherwise = [] 
  where  
        a = head object
        b = object !! num


 -- Determines the name and distance of an instance of "Evol" if the instance is a "MolSeq".
instance Evol MolSeq where
 name = seqName
 distance = seqDistance

 -- Determines the name and distance of an instance of "Evol" if the instance is a "Profile".
instance Evol Profile where
 name = profileName
 distance = profileDistance

Especially this part:

addRow object num  -- Adds row to distance matrix
  | num < length object = (name a, name b, distance a b) : addRow object (num + 1)
  | otherwise = [] 
  where  
        a = head object
        b = object !! num

You don't have to explain this one if you don't want to I am just slightly confused as to what 'addRow' actually is trying to do (in detail).

Thanks!


回答1:


map (\x -> (head x, length x)) . group . sort is an idiomatic way of generating a histogram. When you see something like this that you don’t understand, try breaking it down into smaller pieces and testing them on sample inputs:

(\x -> (head x, length x)) "AAAA"
-- ('A', 4)

(group . sort) "CABABA"
-- ["AAA", "BB", "C"]

(map (\x -> (head x, length x)) . group . sort) "CABABA"
map (\x -> (head x, length x)) (group (sort "CABABA"))
-- [('A', 3), ('B', 2), ('C', 1)]

It’s written in point-free style as a composition of 3 functions, map (…), group, and sort, but could also be written as a lambda:

\row -> map (…) (group (sort row))

For each row in the transposed matrix, it produces a histogram of the data in that row. You could get a more visual representation of this by formatting it and printing it out:

let
  showHistogramRow row = concat
    [ show $ head row
    , ":\t"
    , replicate (length row) '#'
    ]
  input = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

putStr
  $ unlines
  $ map showHistogramRow
  $ group
  $ sort input

-- 1:   ##
-- 2:   #
-- 3:   ##
-- 4:   #
-- 5:   ###
-- 6:   #
-- 9:   #

As for this:

addRow object num  -- Adds row to distance matrix
  | num < length object = (name a, name b, distance a b) : addRow object (num + 1)
  | otherwise = [] 
  where  
        a = head object
        b = object !! num

addRow makes a list of the distances from the first element in object to each of the other elements. It uses indexing into the list in a sort of non-obvious way, when a simpler and more idiomatic map would suffice:

addRow object = map (\ b -> (name a, name b, distance a b)) object
  where a = head object

Ordinarily it’s good to avoid partial functions such as head because they can throw an exception on some inputs (e.g. head []). Here it’s fine, however, because if the input list is empty, then a will never be used, and so head will never be called.

distanceMatrix could be expressed with a map as well, because it’s just calling a function (addRow) on all the tails of the list and concatenating them together with ++:

distanceMatrix object = concatMap addRow (tails object)

This could be written in point-free style too. \x -> f (g x) can be written as just f . g; here, f is concatMap addRow and g is tails:

distanceMatrix = concatMap addRow . tails

Evol just describes the set of types for which you can generate a distanceMatrix, including MolSeq and Profile. Note that addRow and distanceMatrix don‘t need to be members of this class, because they’re implemented entirely in terms of name and distance, so you could move them to the top level:

distanceMatrix :: (Evol object) => [object] -> [(String, String, Double)]
distanceMatrix = concatMap addRow . tails

addRow :: (Evol object) => [object] -> Int -> [(String, String, Double)]
addRow object = map (\ b -> (name a, name b, distance a b)) object
  where a = head object



回答2:


the inner map function only has an anonymous function but no list to operate on

Given there's a function f of type a -> b -> c, which takes two arguments and returns a value of type c. If the f is called with one parameter it returns another function of type b -> c, which is going to take one more parameter and return a value. This is called currying.

This line:

map (map (\x -> ((head x), (length x))) . group . sort) (transpose strs)

can be transformed into:

map (\str -> (map (\x -> ((head x), (length x))) . group . sort) str)(transpose strs)

In this form, it might be cleared, that there's actually a list to operate on.

This function

(map (\x -> ((head x), (length x))) . group . sort)

is just a composition of sort, group and map (\x -> ((head x), (length x))).

Let's see how it works on [2,1,1,1,4]:

sort [2, 1, 1, 1, 4] => [1, 1, 1, 2, 4]

group [1, 1, 1, 2, 4] => [[1,1,1],[2],[4]]

map (\x -> ((head x), (length x))) => [(1,3),(2,1),(4,1)]

It just returns a list of tuples. Every tuple contains an element as a first element and the number of occurrence as a second element.



来源:https://stackoverflow.com/questions/46131310/haskell-having-trouble-understanding-a-small-bit-of-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!