Haskell: converting a list of (a, b) key-value pairs (with possibly repeated keys) to a list of (a, [b]) grouped by key

前端 未结 6 1285
遥遥无期
遥遥无期 2021-01-01 17:13

I\'m a Haskell beginner. Let\'s suppose I want to write a function convertKVList that takes a flat list of key-value pairs, where some of the keys might be rep

相关标签:
6条回答
  • 2021-01-01 17:56

    Hoogle is not the only search engine able to search Haskell libraries by type signatures and it definitely and unfortunately covers only a small part of Hackage. Searching with Hayoo for a type signature [(a,b)]->[(a,[b])] brought up these two implementations:

    • aggregateAL
    • alistCollect

    Concerning your take on the problem, since in your function you already bring up a higher level datastructure (Map), it doesn't make sense to downgrade to a more primitive associative list in the output, because:

    1. Most of the algorithms you can possibly utilize such data in will only benefit from getting a Map input, because it's much more effective for dealing with key-value stores, and if you ever find yourself still needing a list you can always utilize the toList in place.
    2. Map implies the absence of duplicate keys on type level, which is not any less important, since in Haskell you should always do the maximum proofs using the type-system. This principle is essentially what makes the statement "If it compiles, it works" closest to truth.

    In other words this is the correct definition of your function:

    convertKVList :: (Ord a) => [(a, b)] -> Map a [b]
    convertKVList ls =
      Map.fromListWith (++) . map (\(x,y) -> (x,[y])) $ ls
    

    Hayooing for that type signature brings a couple of already implemented results too.

    Concerning the approaching the problem, it's classic: "Divide and conquer!". Chris has some good points too in his answer.

    0 讨论(0)
  • 2021-01-01 17:57

    That looks like an understandable solution and you can clean it up slightly more:

    import Data.Map (toList, fromListWith)
    import Control.Arrow (second)
    
    convertKVList :: Ord a => [(a, b)] -> [(a, [b])]
    convertKVList = toList . fromListWith (++) . map (second (:[]))
    

    Regarding how you might come up with this on your own: assuming you started with Data.Map, then you want to use the map to combine values with equal keys. The documentation for Data.Map on Hackage says a is the type for values and k for keys.

    Knowing this, you can search for a -> a -> a to find functions that might combine two values in a Map k a to produce a new a value. This narrows the API down to a handful of functions like insertWith, fromListWith, and fromAscListWith.

    Similarly, to convert your Map k a to [(k, a)], you can search the documentation for Map k a -> [(k, a)] and find only a few functions like assocs, toList, toAscList, and toDescList. Note that in your case, [(k, a)] is instantiated to [(Int, [Int])].

    One thing I've found helpful in understanding the standard Haskell libraries is to view the source on Hackage. Seeing which functions are implemented in terms of others helps make the API feel smaller, and I can see which functions are the basic building blocks.

    0 讨论(0)
  • 2021-01-01 17:59

    One of the most important points when writing a function, is trying to split what it should do into separate sub-tasks (which are often put together by function composition in the end). E.g., in the definition you came up with, there are three tasks (in order of application, i.e., from right to left in the definition):

    1. map the 2nd component of each pair to a singleton list (thereby enabling the use of Map.fromListWith)
    2. create a map (which takes care of merging entries with equal keys)
    3. turn it into a list

    I wanted to post a different solution (which was an exact copy of the code Mark posted in the meanwhile ;)). Just to make clear that most of the time there are different routes to the same goal. In his definition you have the separate tasks:

    1. sort the list by keys
    2. group the result by keys
    3. turn it into a list of the desired type

    Once again, separation of concerns (modularity) is an important principle. Just try to apply it to small problems and once you gained some experience you will be able to come up with surprisingly simple solutions to seemingly difficult problems.

    0 讨论(0)
  • 2021-01-01 18:01

    I suspect that without dipping into mutation and the ST monad, you are unlikely to improve on the Map.fromListWith solution (or substantially equivalent alternatives like using HashMap.fromListWith). I'd just go with that.

    Basically, with mutation you can do this grouping in near-linear time by using a mutable hash table with a as the keys and mutable lists of b as the values. Without mutation, however, it's going to be worse, because each insert into a balanced search tree is O(log n); this is because "inserting" means constructing a new copy of each tree node that leads to the one your inserted element goes in. And you need to do n inserts—which gives you exactly the O(n * log n) bounds that the Map.fromListWith function has. Sorting the association list ahead of time doesn't fundamentally improve this, because sorting is also O(n * log n).

    So to improve on O(n * log n), you need data structures with mutation. I just did a quick Google and the best bet would be to implement the standard imperative algorithm using something like the hashtables library (which I've never tried, so I can't vouch for it). To use this you're going to need to understand Control.Monad.ST and Data.STRef. The ST monad is a technique that GHC provides for using mutation "internally" in a pure function—it uses some type system extensions to guarantee that the side effects cannot be observed outside of the functions in question. HaskellWiki has some examples, but it might take some studying and practice to feel comfortable with this one.

    The other thing I would recommend, if you feel like you want to understand Data.Map or similar libraries better, is to look at Chris Okasaki's Purely Functional Data Structures book (or his dissertation (PDF) that the book is based on). It's based on Standard ML instead of Haskell, the data structures are not the same, and it can be a bit of a difficult read, but it's a foundational book.

    0 讨论(0)
  • 2021-01-01 18:03

    So, my solution overuses pattern-matching because I don't actually know what functions are in the standard library.

    The idea was that if the list is sorted by the keys, then you can collect your key-values as you go. To do the logic of checking whether to add to the first key-value list or create a new entry, I used patterns and guards to define the conditionals. And liberal use of cons to prepend values to a list.

    And in case the original list isn't sorted, there's a sortBy.

    import Data.List
    import Data.Ord
    
    ls = [(2, 1), (1, 2), (1, 4), (1, 3), (2, 3)]
    
    addval [] (k, v)= [(k, [v])]
    addval ((k1, vals) : xs) (k2, v) | k1 == k2
      = ((k1, (v : vals)) : xs)
    addval ls (k, v) = ((k, [v]) : ls)
    
    convert ls = foldl addval [] (sortBy (comparing fst) ls)
    

    Ugly code, but it avoids using Map.

    0 讨论(0)
  • 2021-01-01 18:10

    while this is by no means canonical:

    import Data.List
    import Data.Ord
    import Data.Function (on)
    
    convertKVList :: Ord a => [(a,b)] -> [(a,[b])]
    convertKVList = map (\x -> (fst $ head x,  map snd x)) . groupBy ((==) `on` fst) . sortBy (comparing fst)
    

    it does have the advantage of not pulling in Data.Map. should be asymptotically the same, haven't benchmarked. I think you could clean up the first chunk with Control.Arrow (something like (fst . head &&& map snd)) but it's not obviously cleaner.

    Not sure how you'd come to it except by either knowing it or asking in #haskell, though.

    0 讨论(0)
提交回复
热议问题