I know Haskell a little bit, and I wonder if it\'s possible to write something like a matrix-matrix product in Haskell that is all of the following:
There are two angles to attack this problem on.
Research, along these lines, is ongoing. Now, there are plenty of Haskell programmers who are smarter than me; a fact I am constantly reminded of and humbled by. One of them may come by and correct me, but I don't know of any simple way to compose safe Haskell primitives into a top-of-the-line matrix multiplication routine. Those papers that you talk about sound like a good start.
However, I'm not a computer science researcher. I wonder if it's possible to keep simple things simple in Haskell.
If you cite those papers, maybe we could help decipher them.
Software engineering, along these lines, is well-understood, straightforward, and even easy. A savvy Haskell coder would use a thin wrapper around BLAS, or look for such a wrapper in Hackage.
Deciphering cutting-edge research is an ongoing process that shifts knowledge from the researchers to the engineers. It was a computer science researcher, C.A.R. Hoare, who first discovered quicksort and published a paper about it. Today, it is a rare computer science graduate who can't personally implement quicksort from memory (at least, those that graduated recently).
Almost this exact question has been asked in history a few times before.
Is it possible to write matrix arithmetic in Fortran that is as fast as assembly?
Is it possible to write matrix arithmetic in C that is as fast as Fortran?
Is it possible to write matrix arithmetic in Java that is as fast as C?
Is it possible to write matrix arithmetic in Haskell that is as fast as Java?
So far, the answer has always been, "not yet", followed by "close enough". The advances that make this possible come from improvements in writing code, improvements to compilers, and improvements in the programming language itself.
As a specific example, C was not able to surpass Fortran in many real-world applications until C99 compilers became widespread in the past decade. In Fortran, different arrays are assumed to have distinct storage from each other, whereas in C this is not generally the case. Fortran compilers were therefore permitted to make optimizations that C compilers could not. Well, not until C99 came out and you could add the restrict
qualifier to your code.
The Fortran compilers waited. Eventually the processors became complex enough that good assembly writing became more difficult, and the compilers became sophisticated enough that the Fortran was fast.
Then C programmers waited until the 2000s for the ability to write code that matched Fortran. Until that point, they used libraries written in Fortran or assembler (or both), or put up with the reduced speed.
The Java programers, likewise, had to wait for JIT compilers, and had to wait for specific optimizations to appear. JIT compilers were originally an esoteric research concept until they became a part of daily life. Bounds checking optimization was also necessary in order to avoid a test and branch for every array access.
So, it is clear the Haskell programmers are "waiting", just like the Java, C, and Fortran programmers before them. What are we waiting for?
Maybe we're just waiting for someone to write the code, and show us how it's done.
Maybe we're waiting for the compilers to get better.
Maybe we're waiting for an update to the Haskell language itself.
And maybe we're waiting for some combination of the above.
Purity and monads get conflated a lot in Haskell. The reason for this is because in Haskell, impure functions always use the IO
monad. For example, the State
monad is 100% pure. So when you say, "pure" and "type signature does not use the State
monad", those are actually completely independent and separate requirements.
However, you can also use the IO
monad in the implementation of pure functions, and in fact, it's quite easy:
addSix :: Int -> Int
addSix n = unsafePerformIO $ return (n + 6)
Okay, yes, that's a stupid function, but it is pure. It's even obviously pure. The test for purity is twofold:
Does it give the same result for the same inputs? Yes.
Does it produce any semantically significant side effects? No.
The reason we like purity is because pure functions are easier to compose and manipulate than impure functions are. How they're implemented doesn't matter as much. I don't know if you're aware of this, but Integer
and ByteString
are both basically wrappers around impure C functions, even though the interface is pure. (There's work on a new implementation of Integer
, I don't know how far it is.)
The question is whether Haskell's approach (purity encoded in the type system) is compatible with efficiency, memory-safety and simplicity.
The answer to that part is "yes", since we can take simple functions from BLAS and put them in a pure, type-safe wrapper. The wrapper's type encodes the safety of the function, even though the Haskell compiler is unable to prove that the function's implementation is pure. Our use of unsafePerformIO
in its implementation is both an acknowledgement that we have proven the purity of the function, and it is also a concession that we couldn't figure out a way to express that proof in Haskell's type system.
But the answer is also "not yet", since I don't know how to implement the function entirely in Haskell as such.
Research in this area is ongoing. People are looking at proof systems like Coq and new languages like Agda, as well as developments in GHC itself. In order to see what kind of type system we'd need to prove that high-performance BLAS routines can be used safely. These tools can also be used with other languages like Java. For example, you could write a proof in Coq that your Java implementation is pure.
I apologize for the "yes and no" answer, but no other answer would recognize both the contributions of engineers (who care about "yes") and researchers (who care about "not yet").
P.S. Please cite the papers.