Machine learning in OCaml or Haskell?

前端 未结 11 1149
天命终不由人
天命终不由人 2021-01-29 17:51

I\'m hoping to use either Haskell or OCaml on a new project because R is too slow. I need to be able to use support vectory machines, ideally separating out each execution to r

相关标签:
11条回答
  • 2021-01-29 18:05

    While dons is correct that multicore parallelism at the thread level is better supported in Haskell, it sounds like you could live with process level parallelism (from your phrase: ideally separating out each execution to run in parallel.) which is supported quite well in OCaml. Keith pointed out that Haskell has a more powerful type system, but it can also be said that OCaml has a more powerful module system than Haskell.

    As others have pointed out, OCaml's learning curve will be lower than Haskell's; you'll likely be more productive more quickly in OCaml. That said, learning OCaml is a great stepping-stone towards learning Haskell because many of the underlying concepts are very similar, so you could always migrate to Haskell later and find a lot of things familiar there. And as you pointed out, there is an OCaml-R bridge.

    0 讨论(0)
  • 2021-01-29 18:05

    As an examples of Haskell and Ocaml in machine learning see stuff at Hal Daume and Lloyd Allison homepages. IMO it's is much more straightforward to achieve C++-like performance in Ocaml, than in Haskell. Through, as already said, Haskell has much nicer community (packages, tools and support), syntax&features (i.e. FFI, probability monads via typeclasses) and parallel programming support.

    0 讨论(0)
  • 2021-01-29 18:07

    Late answer but a machine learning library in Haskell is available here : https://github.com/mikeizbicki/HLearn

    This library implements various ML algorithms who are designed to have a much faster cross-validation than the usual implementations. It is based on the following paper Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. The authors claims a 400x speed-up compared to the same task in Weka.

    0 讨论(0)
  • 2021-01-29 18:08

    Hal Daume has written several major machine learning algorithms during his Ph.D. (now he is an assistant professor and rising star in machine learning community)

    On his web page, there are a SVM, a simple decision tree and a logistic regression all in OCaml. By reading these code, you can have a feeling how machine learning models are implemented in OCaml.

    Another good example of writing basic machine learning models is Owl library for scientific and numeric computations in OCaml.

    I'd also like to mention F#, a new .Net language similar to OCaml. Here's a factor graph model written in F# analyzing Chess play data. This research also has a NIPS publication.

    While FP is suitable for implementing machine learning and data mining models. But what you can get here most is NOT performance. It is right that FP supports parallel computing better than imperative languages, like C# or Java. But implementing a parallel SVM, or decision tree, has very little relation to do with the language! Parallel is parallel. The numerical optimizations behind machine learning and data mining are usually imperative, writing them pure-functionally is usually hard and less efficient. Making these sophisticated algorithms parallel is very hard task in the algorithm level, not in the language level. If you want to run 100 SVM in parallel, FP helps here. But I don't see the difficulty running 100 libsvm parallel in C++, not to consider that the single thread libsvm is more efficient than a not-well-tested haskell svm package.

    Then what do FP languages, like F#, OCaml, Haskell, give?

    1. Easy to test your code. FP languages usually have a top-level interpreter, you can test your functions on the fly.

    2. Few mutable states. This means that passing the same parameter to a function, this function always gives the same result, thus debugging is easy in FPs.

    3. Code is succinct. Type inference, pattern matching, closures, etc. You focus more on the domain logic, and less on the language part. So when you write the code, your mind is mainly thinking about the programming logic itself.

    4. Writing code in FPs is fun.

    0 讨论(0)
  • 2021-01-29 18:08

    for haskell, consider checking hasktorch (which I managed to use for my AI thesis). for ocaml there seem to be tensorflow bindings.

    0 讨论(0)
  • 2021-01-29 18:16

    If speed is your prime concern then go for C. Haskell is pretty good performance wise but you are never going to get as fast as C. To my knowledge the only functional language that has bettered C in a benchmark is Stalin Scheme but that is very old and nobody really knows how it works.

    I've written genetic programming libraries where performance was key and I wrote it in a functional style in C. The functional style allowed me to easily parallelise it using OMP and it scales linearly upto 8 cores within a single process. You certainly can't do that in OCaml although Haskell is improving all the time with regards to concurrency and parallelism.

    The downside of using C was that it took me months to finally find all the bugs and stop the core dumps which was extremely challenging because of the concurrency. Haskell would probably have caught 90% of those bugs on the first compilation.

    So speed at any cost ? Looking back I'd wish I'd used Haskell as I could stand it to be 2 - 3 times slower if I'd saved over a month in development time.

    0 讨论(0)
提交回复
热议问题