Haskell “collections” language design

前端 未结 5 1880
耶瑟儿~
耶瑟儿~ 2021-02-07 03:21

Why is the Haskell implementation so focused on linked lists?

For example, I know Data.Sequence is more efficient with most of the list operations (except for the

相关标签:
5条回答
  • 2021-02-07 03:34

    With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write

    ['0' .. '9']             :: Set Char
    [1 .. 10]                :: Vector Int
    [("default",0), (k1,v1)] :: Map String Int
    ['a' .. 'z']             :: Text
    

    (quoted from the documentation).

    0 讨论(0)
  • 2021-02-07 03:35

    A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.

    The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.

    List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.

    The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.

    0 讨论(0)
  • 2021-02-07 03:37

    I am pretty sure this won't be an answer to your question, but still.

    I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.

    The amount of code change while migrating between list and custom sequence types would be minimal then.

    About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.

    import Prelude hiding(map)
    
    map :: (Functor f) => (a -> b) -> f a -> f b
    map = fmap
    

    Prelude is great, but it isn't the best part of Haskell.

    0 讨论(0)
  • 2021-02-07 03:56

    Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class. Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.

    • fmap instead of map
    • mempty instead of []
    • mappend instead of (++)

    If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.

    -- For now, use lists
    type List a = [a]
    nil = []
    cons x xs = x : xs
    
    {- Switch to Seq in the future
    -- type List a = Seq a
    -- nil = empty
    -- cons x xs = x <| xs
    -}
    

    Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.


    Why there's so much list-specific stuff in Haskell

    Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.

    The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.

    There are several classes that are useful generalizations of lists:

    • Functor supplies fmap, a generalization of map.
    • Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
    • Applicative and Monad supply methods that are useful for interpreting collections as computations.
    • Traversable and Foldable supply useful methods for running computations over collections.

    Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.

    0 讨论(0)
  • 2021-02-07 03:57

    I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.

    EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:

    3You might ask why we need a separate map function. Why not just do away with the current list-only map function, and rename fmap to map instead? Well, that’s a good question. The usual argument is that someone just learning Haskell, when using map incorrectly, would much rather see an error about lists than about Functors.

    For (:), the (<|) function seems to be a replacement. I have no idea about [].

    0 讨论(0)
提交回复
热议问题