In the book Real World OCaml, the authors put why OCaml uses let rec
for defining recursive functions.
OCaml distinguishes between nonrecurs
I am not an expert, but I'll make a guess until the truly knowledgable guys show up. In OCaml there can be side effects that happen during the definition of a function:
let rec f =
let () = Printf.printf "hello\n" in
fun x -> if x <= 0 then 12 else 1 + f (x - 1)
This means that the order of function definitions must be preserved in some sense. Now imagine that two distinct sets of mutually recursive functions are interleaved. It doesn't seem at all easy for the compiler to preserve the order while processing them as two separate mutually recursive sets of definitions.
The use of `let rec ... and`` means that distinct sets of mutually recursive function definitions can't be interleaved in OCaml as they can in Haskell. Haskell doesn't have side effects (in some sense), so definitions can be freely reordered.
I'd say that in OCaml they are trying to make REPL and source files work the same way. So, it's perfectly reasonable to redefine some function in REPL; therefore, they have to allow it in the source as well. Now, if you use the (redefined) function in itself, OCaml needs some way of knowing which of the definitions to use: the previous one or the new one.
In Haskell they've just gave up and accepted that REPL works differentyle from source files.
It's not a question of purity, it's a question of specifying what environment the typechecker should check an expression in. It actually gives you more power than you would have otherwise. For example (I'm going to write Standard ML here because I know it better than OCaml, but I believe the typechecking process is pretty much the same for the two languages), it lets you distinguish between these cases:
val foo : int = 5
val foo = fn (x) => if x = foo then 0 else 1
Now as of the second redefinition, foo
has the type int -> int
. On the other hand,
val foo : int = 5
val rec foo = fn (x) => if x = foo then 0 else 1
does not typecheck, because the rec
means that the typechecker has already decided that foo
has been rebound to the type 'a -> int
, and when it tries to figure out what that 'a
needs to be, there is a unification failure because x = foo
forces foo
to have a numeric type, which it doesn't.
It can certainly "look" more imperative, because the case without rec
allows you to do things like this:
val foo : int = 5
val foo = foo + 1
val foo = foo + 1
and now foo
has the value 7. That's not because it's been mutated, however --- the name foo has been rebound 3 times, and it just so happens that each of those bindings shadowed a previous binding of a variable named foo
. It's the same as this:
val foo : int = 5
val foo' = foo + 1
val foo'' = foo' + 1
except that foo
and foo'
are no longer available in the environment after the identifier foo
has been rebound. The following are also legal:
val foo : int = 5
val foo : real = 5.0
which makes it clearer that what's happening is shadowing of the original definition, rather than a side effect.
Whether or not it's stylistically a good idea to rebind identifiers is questionable -- it can get confusing. It can be useful in some situations (e.g. rebinding a function name to a version of itself that prints debugging output).
I think this has nothing to do with being purely functional, it is just a design decision that in Haskell you are not allowed to do
let a = 0;;
let a = a + 1;;
whereas you can do it in Caml.
In Haskell this code won't work because let a = a + 1
is interpreted as a recursive definition and will not terminate.
In Haskell you don't have to specify that a definition is recursive simply because you can't create a non-recursive one (so the keyword rec
is everywhere but is not written).
What are the technical reasons that enforces let rec while pure functional languages not?
Recursiveness is a strange beast. It has a relation to purity, but it's a little more oblique than this. To be clear, you could write "alterna-Haskell" which retains its purity, its laziness but does not have recursively bound let
s by default and demands some kind of rec
marker just as OCaml does. Some would even prefer this.
In essence, there are just many different kinds of "let"s possible. If we compare let
and let rec
in OCaml we'll see a small difference. In static formal semantics, we might write
Γ ⊢ E : A Γ, x : A ⊢ F : B
-----------------------------
Γ ⊢ let x = E in F : B
which says that if we can prove in a variable environment Γ
that E
has type A
and if we can prove in the same variable environment Γ
augmented with x : A
that F : B
then we can prove that in the variable environment Γ
let x = E in F
has type B
.
The thing to watch is the Γ
argument. This is just a list of ("variable name", "value") pairs like [(x, 3); (y, "hello")]
and augmenting the list like Γ, x : A
just means consing (x, A)
on to it (sorry that the syntax is flipped).
In particular, let's write the same formalism for let rec
Γ, x : A ⊢ E : A Γ, x : A ⊢ F : B
-------------------------------------
Γ ⊢ let rec x = E in F : B
In particular, the only difference is that neither of our premises work in the plain Γ
environment; both are allowed to assume the existence of the x
variable.
In this sense, let
and let rec
are simply different beasts.
So what does it mean to be pure? At the strictest definition, of which Haskell doesn't even participate, we must eliminate all effects including non-termination. The only way to achieve this is to pull away our ability to write unrestricted recursion and replace it only carefully.
There exist plenty of languages without recursion. Perhaps the most important one is the Simply Typed Lambda Calculus. In it's basic form it is regular lambda calculus but augmented with a typing discipline where types are bit like
type ty =
| Base
| Arr of ty * ty
It turns out that STLC cannot represent recursion---the Y combinator, and all other fixed-point cousin combinators, cannot be typed. Thusly, STLC is not Turing Complete.
It is however uncompromisingly pure. It achieves that purity with the bluntest of instruments, however, by completely outlawing recursion. What we'd really like is some kind of balanced, careful recursion which doesn't lead to non-termination---we'll still be Turing Incomplete, but not so crippled.
Some languages try this game. There are clever ways of adding typed recursion back along a division between data
and codata
which ensures that you cannot write non-terminating functions. If you're interested, I suggest learning a bit of Coq.
But OCaml's goal (and Haskell's as well) is not to be delicate here. Both languages are uncompromisingly Turing Complete (and therefore "practical"). So let's discuss some more blunt ways of augmenting the STLC with recursion.
The bluntest of the bunch is to add a single built-in function called fix
val fix : ('a -> 'a) -> 'a
or, in more genuine OCaml-y notation which requires eta-expansion
val fix : (('a -> 'b) -> ('a -> 'b)) -> ('a -> 'b)
Now, remember that we're only considering a primitive STLC with fix
added. We can indeed write fix
(the latter one at least) in OCaml, but that's cheating at the moment. What does fix
buy the STLC as a primitive?
It turns out that the answer is: "everything". STLC + Fix (basically a language called PCF
) is impure and Turing Complete. It's also simply tremendously difficult to use.
So this is the final hurdle to jump: how do we make fix
easier to work with? By adding recursive bindings!
Already, STLC has a let
construction. You can think of it as just syntax sugar:
let x = E in F ----> (fun x -> F) (E)
but once we've added fix
we also have the power to introduce let rec
bindings
let rec x a = E in F ----> (fun x -> F) (fix (fun x a -> E))
At this point it should again be clear: let
and let rec
are very different beasts. They embody different levels of linguistic power and let rec
is a window to allow fundamental impurity through Turing Completeness and its partner-effect non-termination.
So, at the end of the day, it's a little amusing that Haskell, the purer of the two languages, made the interesting choice of abolishing plain let
bindings. That's really the only difference: there is no syntax for representing a non-recursive binding in Haskell.
At this point it's essentially just a style decision. The authors of Haskell determined that recursive bindings were so useful that one might as well assume that every binding is recursive (and mutually so, a can of worms ignored in this answer so far).
On the other hand, OCaml gives you to ability to be totally explicit about the kind of binding you choose, let
or let rec
!
When you define a semantics of function definition, as a language designer, you have choices: either to make the name of the function visible in the scope of its own body, or not. Both choices are perfectly legal, for example C-family languages being far from functional, still do have names of definitions visible in their scope (this also extends to all definitions in C, making this int x = x + 1
legal). OCaml language decides to give us extra flexibility of making the choice by ourselves. And that's really great. They decided to make it invisible by default, a fairly descent solution, since most of the functions that we write are non recursive.
What concerning the cite, it doesn't really correspond to the function definitions – the most common use of the rec
keyword. It is mostly about "Why the scope of function definition doesn't extend to the body of the module". This is a completely different question.
After some research I've found a very similar question, that has an answer, that might satisfy you, a cite from it:
So, given that the type checker needs to know about which sets of definitions are mutually recursive, what can it do? One possibility is to simply do a dependency analysis on all the definitions in a scope, and reorder them into the smallest possible groups. Haskell actually does this, but in languages like F# (and OCaml and SML) which have unrestricted side-effects, this is a bad idea because it might reorder the side-effects too. So instead it asks the user to explicitly mark which definitions are mutually recursive, and thus by extension where generalization should occur.
Even without any reordering, with arbitrary non-pure expressions, that can occur in the function definition (a side effect of definition, not evaluation) it is impossible to build the dependency graph. Consider demarshaling and executing function from file.
To summarize, we have two usages of let rec
construct, one is to create a self recursive function, like
let rec seq acc = function
| 0 -> acc
| n -> seq (acc+1) (n-1)
Another is to define mutually recursive functions:
let rec odd n =
if n = 0 then true
else if n = 1 then false else even (n - 1)
and even n =
if n = 0 then false
else if n = 1 then true else odd (n - 1)
At the first case, there is no technical reasons to stick to one or to another solution. This is just a matter of taste.
The second case is harder. When inferring type you need to split all function definitions into clusters consisting of mutually depending definitions, in order to narrow typing environment. In OCaml it is harder to make, since you need to take into account side-effects. (Or you can continue without splitting it into principal components, but this will lead to another issue – your type system will be more restrictive, i.e., will disallow more valid programs).
But, revisiting the original question and the quote from RWO, I'm still pretty sure that there is no technical reasons for adding the rec
flag. Consider, SML that has the same problems, but still has rec
enabled by default. There is a technical reason, for let ... and ...
syntax for defining a set of mutual recursive functions. In SML this syntax doesn't require us to put the rec
flag, in OCaml does, thus giving us more flexibility, like the ability to swap to values with let x = y and y = x
expression.