I\'m pretty new a Haskell programming. I\'m trying to deal with its classes, data, instances and newtype. Here is what I\'ve understood:
data NewData = Const
Another way to think about the Haskell data structure is this "discriminated union" construction in C:
typedef enum { constr1, constr2 } NewDataEnum;
typedef struct {
NewDataEnum _discriminator;
union {
struct { int a,b; } _ConStr1;
struct { float a,b; } _ConStr2;
} _union;
} NewData;
Note that in order to access any of the Int or Float values in the Haskell type you have to pattern match the constructor, and this corresponds to looking at the value of the _discriminator
field.
For example, this Haskell function:
foo :: NewData -> Bool
foo (ConStr1 a b) = a + b > 0
foo (ConStr2 a b) = a * b < 3
could be implemented as this C function:
int foo(NewData n) {
switch (n._discriminator) {
case constr1: return n._union._ConStr1.a + n._union._ConStr1.b > 0;
case constr2: return n._union._ConStr2.a * n._union._ConStr2.b < 3;
}
// will never get here
}
For completeness, here are the implementation of the constructor ConStr1
using the above C definitions:
NewData ConStr1(int a, int b) {
NewData r;
r._discriminator = constr1;
r._union._ConStr1.a = a;
r._union._ConStr1.b = b;
return r;
}
Java and C# don't have direct support for unions. In a C union all of the fields of the union are assigned the same offset within the containing structure and so the size of the union is the size of its largest member. I've seen C# code which doesn't worry about wasting space and simply uses a struct
for a union. Here is an MSDN article which discussed how to the get the overlapping effect that C-style unions have.
Algebraic data types are in many ways complementary to objects - things that are easy to do with one are difficult to do with the other - and so it is not surprising that they don't translate well to an OO implementation. Any discussion of the "Expression Problem" usually highlights the complementary nature of these two systems.
Objects, type classes and algrbraic data types may be thought as different ways to efficiently transfer control by means of a jump table, but the location of this table is different in each of these cases.
_vptr
)Finally, it should be emphasized that in Haskell you specify very few implementation details of algebraic data types (ADTs). The discriminated union construction is a useful way to think about ADTs in concrete terms, but Haskell compilers are not required to implement them in any specific way.
To make sum types like
data NewData = Constr1 Int Int | Constr2 String Float
I usually do the following in c#
interface INewDataVisitor<out R> {
R Constr1(Constr1 constructor);
R Constr2(Constr2 constructor);
}
interface INewData {
R Accept<R>(INewDataVisitor<R> visitor);
}
class Constr1 : INewData {
private readonly int _a;
private readonly int _b;
Constr1(int a, int b) {
_a = a;
_b = b;
}
int a {get {return _a;} }
int b {get {return _b;} }
R Accept<R>(INewDataVisitor<R> visitor) {
return visitor.Constr1(this);
}
}
class Constr2 : INewData {
private readonly string _a;
private readonly float _b;
Constr2(string a, float b) {
_a = a;
_b = b;
}
string a {get {return _a;} }
float b {get {return _b;} }
R Accept<R>(INewDataVisitor<R> visitor) {
return visitor.Constr2(this);
}
}
This isn't quite the same in terms of type safety because an INewData can also be null
, might never call a method on the visitor and just return default(R)
, might call the visitor multiple times, or any other silly thing.
A c# interface like
interface SomeInterface<T> {
public bool method1(List<T> someParam);
}
Is really more like the following in Haskell:
data SomeInterface t = SomeInterface {
method1 :: [t] -> bool
}
The data types of Haskell are not exactly the same as any particular C# construct. The best you can hope for is to get a simulation of some features. It's really best to understand Haskell types on their own terms. But I'll take a stab at it.
I don't have a C# compiler handy, but I am referring to the documentation to hopefully produce something close to correct. I'll edit later to fix errors if they're pointed out to me.
First of all, an algebraic data type in Haskell is closest to a family of OO classes rather than a single class. The parent class is completely abstract aside from a single field that discriminates the concrete subclasses. All public users of the type must accept only the parent class, and then perform case analysis via the discriminator field and do a type-cast to the more specific subclass indicated by the discriminator.
class NewData {
// every piece of NewData may take one of two forms:
static enum Constructor { C1, C2 }
// each piece of data has a discriminator tag; this is the only structure
// they all have in common.
Constructor discriminator;
// can't construct a NewData directly
private NewData() {}
// private nested subclasses for the derived types
private class Constr1Class : NewData {
int a, b;
Constr1Class(int a, int b) {
this.discriminator = NewData.C1;
this.a = a;
this.b = b;
}
}
private class Constr2Class : NewData {
string c;
float d;
Constr2Class(string c, float d) {
this.discriminator = NewData.C2;
this.c = c;
this.d = d;
}
}
// A bunch of static functions for creating and extracting
// I'm not sure C# will be happy with these, but hopefully it is clear
// that they construct one of the derived private class objects and
// return it as a parent class object
public static NewData Constr1(int a, int b) {
return new Constr1Class(a, b);
}
public static NewData Constr2(string c, float d) {
return new Constr2Class(c, d);
}
// We can't directly get at the members since they don't exist
// in the parent class; we could define abstract methods to get them,
// but I think that obscures what's really happening. You are expected
// to check the discriminator field first to ensure you won't get a
// runtime type cast error.
public static int getA(NewData data) {
Constr1Class d1 = (Constr1Class)data;
return d1.a;
}
public static int getB(NewData data) {
Constr1Class d1 = (Constr1Class)data;
return d1.b;
}
public static string getC(NewData data) {
Constr2Class d2 = (Constr2Class)data;
return d2.c;
}
public static float getD(NewData data) {
Constr2Class d2 = (Constr2Class)data;
return d2.d;
}
}
No doubt you will criticize this as terrible OO code. It certainly is! Haskell's algebraic data types do not claim to be Objects in the Object-Oriented sense. But it should at least give you a sense of how ADTs work.
As for type classes, they do not have anything to do with object-oriented classes. If you squint, they look kind of like a C# Interface, but they are not! For one, type classes can provide default implementations. Type class resolution is also purely static; it has nothing to do with run-time dispatch, as the functions that will be called have all been determined at compile time. Sometimes, the instance of the type class that will be used depends on the return type of a function call rather than any of the parameters. You are best off not even trying to translate it into OO terminology, because they're not the same thing.
GHC's implementation of type classes actually works by creating a dictionary that is passed as an implicit parameter to a function that has a type class constraint in its signature. I.e., if the type looks like Num a => a -> a -> a
, the compiler will pass an extra parameter with the dictionary of the Num
-specific functions used for the actual type used as a
at that call site. So, if the function was called with Int
parameters, it would get an extra dictionary parameter with functions from the Int
instance of Num
.
In essence, the signature is saying "This function is polymorphic as long as you can supply the operations in the Num type class for the type you want to use" and the compiler does provide them as an extra parameter to the function.
That being said, GHC is sometimes able to optimize away the whole extra dictionary parameter entirely and just inline the necessary functions.
As others have said, it's more like a discriminated union - which is an obscure construct that only C / C++ programmers are likely to have heard of.
You can kind of simulate this in an OO language by having an abstract base class for Haskell's "type", with a concrete subclass for each of Haskell's "constructor". In particular, your code fragment says that every NewData
object has four fields; this is incorrect. You can do something like this:
data Stuff = Small Int | Big String Double Bool
Now if I write Small 5
, this is a Stuff
value with only 1 field inside it. (It takes up that amount of RAM.) But if I do Big "Foo" 7.3 True
, this is also a value of type Stuff
, but it contains 3 fields (and takes up that much RAM).
Notice that the constructor name itself is part of the data. That's why you can do something like
data Colour = Red | Green | Blue
Now there are three constructors, each with zero fields. The constructor itself is the data. Now, C# lets you do
enum Colour {Red, Green, Blue}
But that's really just saying
Colour = int;
const int Red = 0;
const int Green = 1;
const int Blue = 2;
Note, in particular, you may say
Colour temp = 52;
By contrast, in Haskell a variable of type Colour
can only contain Red
, Green
or Blue
, and these are not in any way integers. You can define a function to convert them to integers if you like, but that's not how the compiler stores them.
Your comment about getters and setters illustrates the pitfalls of this approach; in Haskell, we don't usually worry about getters and setters. Simply defining a type is sufficient to create values of that type and to access their contents. It's sort of vaguely like a C# struct
with all fields marked public readonly
. (When we do worry about getters, we usually call them "projection functions"...)
In OO, you use classes for encapsulation. In Haskell, you do this with modules. Inside a module, everything has access to everything (much like a class can access every part of itself). You use an export list to say what parts of the module are public to the outside world. In particular, you can make a type name public, while completely hiding its internal structure. Then the only way to create or manipulate values of that type are the functions you expose from the module.
You asked about newtype
?
OK, the newtype
keyword defines a new type name which is actually identical to an old type, but the type checker thinks it's something new and different. For example, an Int
is just a normal number. But if I do
newtype UserID = ID Int
now UserID
is a brand new type, completely unrelated to anything. But under the covers, it's really just another name for good old Int
. What this means is that you can't use UserID
where an Int
is required - and you can't use Int
where a UserID
is required. So you can't mix up a user ID with some other random number just because they're both integers.
You can do exactly the same thing with data
:
data UserID = ID Int
However, now we have a useless UserID
structure that just contains a pointer to an integer. If we use newtype
then a UserID
is an integer, not a structure pointing to an integer. From the programmer's point of view, both definitions are equivilent; but under the hood, newtype
is more efficient.
(Minor nit-pick: Actually to make then identical you need to say
data UserID = ID !Int
which means that the integer field is "strict". Don't worry about this yet.)