What are the differences between typeclasses and Abstract Data Types?
I realize this is a basic thing for Haskell programmers, but I come from a Scala background, and wo
Your question actually touches on three distinct concepts: typeclasses, abstract data types and algebraic data types. Confusingly enough, both "abstract" and "algebraic" data types can be abbreviated as "ADT"; in a Haskell context, ADT almost always means "algebraic".
So let's define all three terms.
An algebraic data type (ADT), is a type that can be made by combining simpler types. The core idea here is a "constructor", which is a symbol that defines a value. Think of this like a value in a Java-style enum, except it can also take arguments. The simplest algebraic data type has just one constructor with no arguments:
data Foo = Bar
there is only one¹ value of this type: Bar
. By itself, this is not very interesting; we need some way to build up bigger types.
The first way is to give our constructor arguments. For example, we can have our Bar
s take an int and a string:
data Foo = Bar Int String
Now Foo
has many different possible values: Bar 0 "baz"
, Bar 100 "abc"
and so on. A more realistic example might be a record for an employee, looking something like this:
data Employee = Employee String String Int
The other way to build up more complicated types is by having multiple constructors to choose from. For example, we can have both a Bar
and a Baz
:
data Foo = Bar
| Baz
Now values of type Foo
can be either Bar
or Baz
. This is in fact exactly how booleans work; Bool
is defined as follows:
data Bool = True
| False
It works exactly how you'd expect. Really interesting types can use both methods to combine themselves. As a rather contrived example, imagine shapes:
data Shape = Rectangle Point Point
| Circle Point Int
A shape can either be a rectangle, defined by its two corners, or a circle which is a center and a radius. (We'll just define Point
as (Int, Int)
.) Fair enough. But here, we run into a snag: it turns out that other shapes also exist! If some heretic who believes in triangles wants to use our type in their model, can they add a Triangle
constructor after the fact? Unfortunately not: in Haskell, algebraic data types are closed, which means you cannot add new alternatives after the fact.
One important thing you can do with an algebraic data type is pattern match on it. This basically means being able to branch on the alternatives of an ADT. As a very simple example, instead of using an if expression, you could pattern match on Bool
:
case myBool of
True → ... -- true case
False → ... -- false case
If your constructors have arguments, you can also access those values by pattern matching. Using Shape
from above, we can write a simple area
function:
area shape = case shape of
Rectange (x₁, y₁) (x₂, y₂) → (x₂ - x₁) * (y₂ - y₁)
Circle _ r → π * r ^ 2
The _
just means we don't care about the value of a point's center.
This is just a basic overview of algebraic data types: it turns out there's quite a bit more fun to be had. You might want to take a look at the relevant chapter in Learn You a Haskell (LYAH for short) for more reading.
Now, what about abstract data types? This refers to a different concept. An abstract data type is one where the implementation is not exposed: you don't know what the values of the type actually look like. The only thing you can do with it is apply functions exported from its module. You can't pattern match on it or construct new values yourself. A good example in practice is Map
(from Data.Map
). The map is actually a particular kind of binary search tree, but nothing in the module lets you work with the tree structure directly. This is important because the tree needs to maintain certain additional invariants which you could easily mess up. So you only ever use Map
as an opaque blob.
Algebraic and abstract types are somewhat orthogonal concepts; it's rather unfortunate that their names make it so easy to mistake one for the other.
The final piece of the puzzle is the typeclass. A typeclass, unlike algebraic and abstract data types, is not a type itself. Rather, think of a typeclass as a set of types. In particular, a typeclass is the set of all types that implement certain functions.
The simplest example is Show
, which is the class of all types that have a string representation; that is, all types a
for which we have a function show ∷ a → String
. If a type has a show
function, we say it is "in Show
"; otherwise, it isn't. Most types you know like Int
, Bool
and String
are all in Show
; on the other hand, functions (any type with a →
) are not in Show
. This is why GHCi cannot print a function.
A typeclass is defined by which functions a type needs to implement to be part of it. For example, Show
could be defined² just by the show
function:
class Show a where
show ∷ a → String
Now to add a new type like Foo
to Show
, we have to write an instance for it. This is the actual implementation of the show
function:
instance Show Foo where
show foo = case foo of
Bar → "Bar"
Baz → "Baz"
After this, Foo
is in Show
. We can write an instance for Foo
anywhere. In particular, we can write new instances after the class has been defined, even in other modules. This is what it means for typeclasses to be open; unlike algebraic data types, we can add new things to typeclasses after the fact.
There is more to typeclasses too; you can read about them in the same LYAH chapter.
¹ Technically, there is another value called ⊥ (bottom) as well, but we'll ignore it for now. You can learn about ⊥ later.
² In reality, Show
actually has another possible function that takes a list of a
s to a String
. This is basically a hack to make strings look pretty since a string is just a list of Char
s rather than its own type.