I am learning Haskell from learnyouahaskell.com. I am having trouble understanding type constructors and data constructors. For example, I don\'t really understand the diffe
Start with the simplest case:
data Color = Blue | Green | Red
This defines a "type constructor" Color
which takes no arguments - and it has three "data constructors", Blue
, Green
and Red
. None of the data constructors takes any arguments. This means that there are three of type Color
: Blue
, Green
and Red
.
A data constructor is used when you need to create a value of some sort. Like:
myFavoriteColor :: Color
myFavoriteColor = Green
creates a value myFavoriteColor
using the Green
data constructor - and myFavoriteColor
will be of type Color
since that's the type of values produced by the data constructor.
A type constructor is used when you need to create a type of some sort. This is usually the case when writing signatures:
isFavoriteColor :: Color -> Bool
In this case, you are calling the Color
type constructor (which takes no arguments).
Still with me?
Now, imagine you not only wanted to create red/green/blue values but you also wanted to specify an "intensity". Like, a value between 0 and 256. You could do that by adding an argument to each of the data constructors, so you end up with:
data Color = Blue Int | Green Int | Red Int
Now, each of the three data constructors takes an argument of type Int
. The type constructor (Color
) still doesn't take any arguments. So, my favorite color being a darkish green, I could write
myFavoriteColor :: Color
myFavoriteColor = Green 50
And again, it calls the Green
data constructor and I get a value of type Color
.
Imagine if you don't want to dictate how people express the intensity of a color. Some might want a numeric value like we just did. Others may be fine with just a boolean indicating "bright" or "not so bright". The solution to this is to not hardcode Int
in the data constructors but rather use a type variable:
data Color a = Blue a | Green a | Red a
Now, our type constructor takes one argument (another type which we just call a
!) and all of the data constructors will take one argument (a value!) of that type a
. So you could have
myFavoriteColor :: Color Bool
myFavoriteColor = Green False
or
myFavoriteColor :: Color Int
myFavoriteColor = Green 50
Notice how we call the Color
type constructor with an argument (another type) to get the "effective" type which will be returned by the data constructors. This touches the concept of kinds which you may want to read about over a cup of coffee or two.
Now we figured out what data constructors and type constructors are, and how data constructors can take other values as arguments and type constructors can take other types as arguments. HTH.
It's about types: In the first case, your set the types String
(for company and model) and Int
for year. In the second case, your are more generic. a
, b
, and c
may be the very same types as in the first example, or something completely different. E.g., it may be useful to give the year as string instead of integer. And if you want, you may even use your Color
type.
Haskell has algebraic data types, which very few other languages have. This is perhaps what's confusing you.
In other languages, you can usually make a "record", "struct" or similar, which has a bunch of named fields that hold various different types of data. You can also sometimes make an "enumeration", which has a (small) set of fixed possible values (e.g., your Red
, Green
and Blue
).
In Haskell, you can combine both of these at the same time. Weird, but true!
Why is it called "algebraic"? Well, the nerds talk about "sum types" and "product types". For example:
data Eg1 = One Int | Two String
An Eg1
value is basically either an integer or a string. So the set of all possible Eg1
values is the "sum" of the set of all possible integer values and all possible string values. Thus, nerds refer to Eg1
as a "sum type". On the other hand:
data Eg2 = Pair Int String
Every Eg2
value consists of both an integer and a string. So the set of all possible Eg2
values is the Cartesian product of the set of all integers and the set of all strings. The two sets are "multiplied" together, so this is a "product type".
Haskell's algebraic types are sum types of product types. You give a constructor multiple fields to make a product type, and you have multiple constructors to make a sum (of products).
As an example of why that might be useful, suppose you have something that outputs data as either XML or JSON, and it takes a configuration record - but obviously, the configuration settings for XML and for JSON are totally different. So you might do something like this:
data Config = XML_Config {...} | JSON_Config {...}
(With some suitable fields in there, obviously.) You can't do stuff like this in normal programming languages, which is why most people aren't used to it.
As others pointed out, polymorphism isn't that terrible useful here. Let's look at another example you're probably already familiar with:
Maybe a = Just a | Nothing
This type has two data constructors. Nothing
is somewhat boring, it doesn't contain any useful data. On the other hand Just
contains a value of a
- whatever type a
may have. Let's write a function which uses this type, e.g. getting the head of an Int
list, if there is any (I hope you agree this is more useful than throwing an error):
maybeHead :: [Int] -> Maybe Int
maybeHead [] = Nothing
maybeHead (x:_) = Just x
> maybeHead [1,2,3] -- Just 1
> maybeHead [] -- None
So in this case a
is an Int
, but it would work as well for any other type. In fact you can make our function work for every type of list (even without changing the implementation):
maybeHead :: [t] -> Maybe t
maybeHead [] = Nothing
maybeHead (x:_) = Just x
On the other hand you can write functions which accept only a certain type of Maybe
, e.g.
doubleMaybe :: Maybe Int -> Maybe Int
doubleMaybe Just x = Just (2*x)
doubleMaybe Nothing= Nothing
So long story short, with polymorphism you give your own type the flexibility to work with values of different other types.
In your example, you may decide at some point that String
isn't sufficient to identify the company, but it needs to have its own type Company
(which holds additional data like country, address, back accounts etc). Your first implementation of Car
would need to change to use Company
instead of String
for its first value. Your second implementation is just fine, you use it as Car Company String Int
and it would work as before (of course functions accessing company data need to be changed).
In a data
declaration, a type constructor is the thing on the left hand side of the equals sign. The data constructor(s) are the things on the right hand side of the equals sign. You use type constructors where a type is expected, and you use data constructors where a value is expected.
To make things simple, we can start with an example of a type that represents a colour.
data Colour = Red | Green | Blue
Here, we have three data constructors. Colour
is a type, and Green
is a constructor that contains a value of type Colour
. Similarly, Red
and Blue
are both constructors that construct values of type Colour
. We could imagine spicing it up though!
data Colour = RGB Int Int Int
We still have just the type Colour
, but RGB
is not a value – it's a function taking three Ints and returning a value! RGB
has the type
RGB :: Int -> Int -> Int -> Colour
RGB
is a data constructor that is a function taking some values as its arguments, and then uses those to construct a new value. If you have done any object-oriented programming, you should recognise this. In OOP, constructors also take some values as arguments and return a new value!
In this case, if we apply RGB
to three values, we get a colour value!
Prelude> RGB 12 92 27
#0c5c1b
We have constructed a value of type Colour
by applying the data constructor. A data constructor either contains a value like a variable would, or takes other values as its argument and creates a new value. If you have done previous programming, this concept shouldn't be very strange to you.
If you'd want to construct a binary tree to store String
s, you could imagine doing something like
data SBTree = Leaf String
| Branch String SBTree SBTree
What we see here is a type SBTree
that contains two data constructors. In other words, there are two functions (namely Leaf
and Branch
) that will construct values of the SBTree
type. If you're not familiar with how binary trees work, just hang in there. You don't actually need to know how binary trees work, only that this one stores String
s in some way.
We also see that both data constructors take a String
argument – this is the String they are going to store in the tree.
But! What if we also wanted to be able to store Bool
, we'd have to create a new binary tree. It could look something like this:
data BBTree = Leaf Bool
| Branch Bool BBTree BBTree
Both SBTree
and BBTree
are type constructors. But there's a glaring problem. Do you see how similar they are? That's a sign that you really want a parameter somewhere.
So we can do this:
data BTree a = Leaf a
| Branch a (BTree a) (BTree a)
Now we introduce a type variable a
as a parameter to the type constructor. In this declaration, BTree
has become a function. It takes a type as its argument and it returns a new type.
It is important here to consider the difference between a concrete type (examples include
Int
,[Char]
andMaybe Bool
) which is a type that can be assigned to a value in your program, and a type constructor function which you need to feed a type to be able to be assigned to a value. A value can never be of type "list", because it needs to be a "list of something". In the same spirit, a value can never be of type "binary tree", because it needs to be a "binary tree storing something".
If we pass in, say, Bool
as an argument to BTree
, it returns the type BTree Bool
, which is a binary tree that stores Bool
s. Replace every occurrence of the type variable a
with the type Bool
, and you can see for yourself how it's true.
If you want to, you can view BTree
as a function with the kind
BTree :: * -> *
Kinds are somewhat like types – the *
indicates a concrete type, so we say BTree
is from a concrete type to a concrete type.
Step back here a moment and take note of the similarities.
A data constructor is a "function" that takes 0 or more values and gives you back a new value.
A type constructor is a "function" that takes 0 or more types and gives you back a new type.
Data constructors with parameters are cool if we want slight variations in our values – we put those variations in parameters and let the guy who creates the value decide what arguments they are going to put in. In the same sense, type constructors with parameters are cool if we want slight variations in our types! We put those variations as parameters and let the guy who creates the type decide what arguments they are going to put in.
As the home stretch here, we can consider the Maybe a
type. Its definition is
data Maybe a = Nothing
| Just a
Here, Maybe
is a type constructor that returns a concrete type. Just
is a data constructor that returns a value. Nothing
is a data constructor that contains a value. If we look at the type of Just
, we see that
Just :: a -> Maybe a
In other words, Just
takes a value of type a
and returns a value of type Maybe a
. If we look at the kind of Maybe
, we see that
Maybe :: * -> *
In other words, Maybe
takes a concrete type and returns a concrete type.
Once again! The difference between a concrete type and a type constructor function. You cannot create a list of Maybe
s - if you try to execute
[] :: [Maybe]
you'll get an error. You can however create a list of Maybe Int
, or Maybe a
. That's because Maybe
is a type constructor function, but a list needs to contain values of a concrete type. Maybe Int
and Maybe a
are concrete types (or if you want, calls to type constructor functions that return concrete types.)
The second one has the notion of "polymorphism" in it.
The a b c
can be of any type. For example, a
can be a [String]
, b
can be [Int]
and c
can be [Char]
.
While the first one's type is fixed: company is a String
, model is a String
and year is Int
.
The Car example might not show the significance of using polymorphism. But imagine your data is of the list type. A list can contain String, Char, Int ...
In those situations, you will need the second way of defining your data.
As to the third way I don't think it needs to fit into the previous type. It's just one other way of defining data in Haskell.
This is my humble opinion as a beginner myself.
Btw: Make sure that you train your brain well and feel comfortable to this. It is the key to understand Monad later.