Why do We Need Sum Types?

前端 未结 4 641
[愿得一人]
[愿得一人] 2021-02-08 04:56

Imagine a language which doesn\'t allow multiple value constructors for a data type. Instead of writing

data Color = White | Black | Blue

we wo

相关标签:
4条回答
  • 2021-02-08 05:38

    Haskell's sum type is very similar to your :|:.

    The difference between the two is that the Haskell sum type | is a tagged union, while your "sum type" :|: is untagged.

    Tagged means every instance is unique - you can distunguish Int | Int from Int (actually, this holds for any a):

    data EitherIntInt = Left Int | Right Int
    

    In this case: Either Int Int carries more information than Int because there can be a Left and Right Int.

    In your :|:, you cannot distinguish those two:

    type EitherIntInt = Int :|: Int
    

    How do you know if it was a left or right Int?

    See the comments for an extended discussion of the section below.

    Tagged unions have another advantage: The compiler can verify whether you as the programmer handled all cases, which is implementation-dependent for general untagged unions. Did you handle all cases in Int :|: Int? Either this is isomorphic to Int by definition or the compiler has to decide which Int (left or right) to choose, which is impossible if they are indistinguishable.

    Consider another example:

    type (Integral a, Num b) => IntegralOrNum a b = a :|: b    -- untagged
    data (Integral a, Num b) => IntegralOrNum a b = Either a b -- tagged
    

    What is 5 :: IntegralOrNum Int Double in the untagged union? It is both an instance of Integral and Num, so we can't decide for sure and have to rely on implementation details. On the other hand, the tagged union knows exactly what 5 should be because it is branded with either Left or Right.


    As for naming: The disjoint union in Haskell is a union type. ADTs are only a means of implementing these.

    0 讨论(0)
  • 2021-02-08 05:38

    I will try to expand the categorical argument mentioned by @BenjaminHodgson.

    Haskell can be seen as the category Hask, in which objects are types and morphisms are functions between types (disregarding bottom).

    We can define a product in Hask as tuple - categorically speaking it meets the definition of the product:

    A product of a and b is the type c equipped with projections p and q such that p :: c -> a and q :: c -> b and for any other candidate c' equipped with p' and q' there exists a morphism m :: c' -> c such that we can write p' as p . m and q' as q . m.

    Read up on this in Bartosz' Category Theory for Programmers for further information.

    Now for every category, there exists the opposite category, which has the same morphism but reverses all the arrows. The coproduct is thus:

    The coproduct c of a and b is the type c equipped with injections i :: a -> c and j :: b -> c such that for all other candidates c' with i' and j' there exists a morphism m :: c -> c' such that i' = m . i and j' = m . j.

    Let's see how the tagged and untagged union perform given this definition:

    The untagged union of a and b is the type a :|: b such that:

    • i :: a -> a :|: b is defined as i a = a and
    • j :: b -> a :|: b is defined as j b = b

    However, we know that a :|: a is isomorphic to a. Based on that observation we can define a second candidate for the product a :|: a :|: b which is equipped with the exact same morphisms. Therefore, there is no single best candidate, since the morphism m between a :|: a :|: b and a :|: b is id. id is a bijection, which implies that m is invertible and "convert" types either way. A visual representation of that argument. Replace p with i and q with j.

    Restricting ourselves Either, as you can verify yourself with:

    • i = Left and
    • j = Right

    This shows that the categorical complement of the product type is the disjoint union, not the set-based union.

    The set union is part of the disjoint union, because we can define it as follows:

    data Left a = Left a
    data Right b = Right b
    type DisjUnion a b = Left a :|: Right b
    

    Because we have shown above that the set union is not a valid candidate for the coproduct of two types, we would lose many "free" properties (which follow from parametricity as leftroundabout mentioned) by not choosing the disjoint union in the category Hask (because there would be no coproduct).

    0 讨论(0)
  • 2021-02-08 05:46

    This is an idea I've thought a lot about myself: a language with “first-class type algebra”. Pretty sure we could do about everything this way that we do in Haskell. Certainly if these disjunctions were, like Haskell alternatives, tagged unions; then you could directly rewrite any ADT to use them. In fact GHC can do this for you: if you derive a Generic instance, a variant type will be represented by a :+: construct, which is in essence just Either.

    I'm not so sure if untagged unions would also do. As long as you require the types participating in a sum to be discernibly different, the explicit tagging should in principle not be necessary. The language would then need a convenient way to match on types at runtime. Sounds a lot like what dynamic languages do – obviously comes with quite some overhead though.
    The biggest problem would be that if the types on both sides of :|: must be unequal then you lose parametricity, which is one of Haskell's nicest traits.

    0 讨论(0)
  • 2021-02-08 05:53

    Given that you mention TypeScript, it is instructive to have a look at what its docs have to say about its union types. The example there starts from a function...

    function padLeft(value: string, padding: any) { //etc.
    

    ... that has a flaw:

    The problem with padLeft is that its padding parameter is typed as any. That means that we can call it with an argument that’s neither a number nor a string

    One plausible solution is then suggested, and rejected:

    In traditional object-oriented code, we might abstract over the two types by creating a hierarchy of types. While this is much more explicit, it’s also a little bit overkill.

    Rather, the handbook suggests...

    Instead of any, we can use a union type for the padding parameter:

    function padLeft(value: string, padding: string | number) { // etc.
    

    Crucially, the concept of union type is then described in this way:

    A union type describes a value that can be one of several types.

    A string | number value in TypeScript can be either of string type or of number type, as string and number are subtypes of string | number (cf. Alexis King's comment to the question). An Either String Int value in Haskell, however, is neither of String type nor of Int type -- its only, monomorphic, type is Either String Int. Further implications of that difference show up in the remainder of the discussion:

    If we have a value that has a union type, we can only access members that are common to all types in the union.

    In a roughly analogous Haskell scenario, if we have, say, an Either Double Int, we cannot apply (2*) directly on it, even though both Double and Int have instances of Num. Rather, something like bimap is necessary.

    What happens when we need to know specifically whether we have a Fish? [...] we’ll need to use a type assertion:

    let pet = getSmallPet();
    
    if ((<Fish>pet).swim) {
        (<Fish>pet).swim();
    }
    else {
        (<Bird>pet).fly();
    }
    

    This sort of downcasting/runtime type checking is at odds with how the Haskell type system ordinarily works, even though it can be implemented using the very same type system (also cf. leftaroundabout's answer). In contrast, there is nothing to figure out at runtime about the type of an Either Fish Bird: the case analysis happens at value level, and there is no need to deal with anything failing and producing Nothing (or worse, null) due to runtime type mismatches.

    0 讨论(0)
提交回复
热议问题