Fast vector math in Clojure / Incanter

后端 未结 5 411
感情败类
感情败类 2021-01-30 09:40

I\'m currently looking into Clojure and Incanter as an alternative to R. (Not that I dislike R, but it just interesting to try out new languages.) I like Incanter and find the s

相关标签:
5条回答
  • 2021-01-30 09:53

    Here's a Java arrays implementation that is on my system faster than your R code (YMMV). Note enabling the reflection warnings, which is essential when optimizing for performance, and the repeated type hint on y (the one on the def didn't seem to help for the aset) and casting everything to primitive double values (the dotimes makes sure that i is a primitive int).

    (set! *warn-on-reflection* true)
    (use 'incanter.stats)
    (def ^"[D" x (double-array (sample-normal 1e7)))
    
    (time
     (do
       (def ^"[D" y (double-array (dec (count x))))
       (dotimes [i (dec (count x))]
         (aset ^"[D" y
           i
           (double (- (double (aget x (inc i)))
                      (double (aget x i))))))))
    
    0 讨论(0)
  • 2021-01-30 09:57

    Here's a solution with transients - appealing but slow.

    (use 'incanter.stats)
    (set! *warn-on-reflection* true)
    (def x (doall (sample-normal 1e7)))
    
    (time
     (def y
          (loop [xs x
                 xs+ (rest x)
                 result (transient [])]
            (if (empty? xs+)
              (persistent! result)
              (recur (rest xs) (rest xs+)
                     (conj! result (- (double (first xs+))
                                      (double (first xs)))))))))
    
    0 讨论(0)
  • 2021-01-30 10:01

    My final solutions

    After all the testing I found two slightly different ways to do the calculation with sufficient speed.

    First I've used the function diff with different types of return values, below is the code returning a vector, but I have also timed a version returning a double-array (replace (vec y) with y) and Incanter.matrix (replace (vec y) with matrix y). This function is only based on java arrays. This is based on Jouni's code with some extra type hints removed.

    Another approach is to do the calculations with Java arrays and store the values in a transient vector. As you see from the timings this is slightly faster than approach 1 if you wan't the function to return and array. This is implemented in function difft.

    So the choice really depends on what you wan't to do with the data. I guess a good option would be to overload the function so that it returns the same type that was used in the call. Actually passing a java array to diff instead of a vector makes ~1s faster.

    Timings for the different functions:

    diff returning vector:

    (time (def y (diff x)))
    "Elapsed time: 4733.259 msecs"
    

    diff returning Incanter.matrix:

    (time (def y (diff x)))
    "Elapsed time: 2599.728 msecs"
    

    diff returning double-array:

    (time (def y (diff x)))
    "Elapsed time: 1638.548 msecs"
    

    difft:

    (time (def y (difft x)))
    "Elapsed time: 3683.237 msecs"
    

    The functions

    (use 'incanter.stats)
    (def x (vec (sample-normal 1e7)))
    
    (defn diff [x]
      (let [y (double-array (dec (count x)))
            x (double-array x)] 
       (dotimes [i (dec (count x))]
         (aset y i
           (- (aget x (inc i))
                       (aget x i))))
       (vec y)))
    
    
    (defn difft [x]
      (let [y (vector (range n))
            y (transient y)
            x (double-array x)]
       (dotimes [i (dec (count x))]
         (assoc! y i
           (- (aget x (inc i))
                       (aget x i))))
       (persistent! y))) 
    
    0 讨论(0)
  • 2021-01-30 10:14

    All the comments thus far are by people who don't seem to have much experience speeding up Clojure code. If you want Clojure code to perform identical to Java - the facilities are available to do so. It may make more sense however to defer to mature Java libraries like Colt or Parallel Colt for vector math. It may make sense to use Java arrays for the absolute highest performance iteration.

    @Shane's link is so full of outdated information to be hardly worth looking at. Also @Shane's comment that code is slower than by factor of 10 is simply inaccurate (and unsupported http://shootout.alioth.debian.org/u32q/compare.php?lang=clojure, and these benchmarks don't account for the kinds of optimization possible in 1.2.0 or 1.3.0-alpha1). With a little bit of work it's usually easy to get Clojure code w/in 4X-5X. Beyond that usually requires a deeper knowledge of Clojure's fast paths - something isn't widely disseminated as Clojure is a fairly young language.

    Clojure is plenty fast. But learning how to make it fast is going to take a bit of work/research as Clojure discourages mutable operations and mutable datastructures.

    0 讨论(0)
  • 2021-01-30 10:19

    Bradford Cross's blog has a bunch of posts about this (he uses this stuff for the startup he works on link text. In general, using transients in inner loops, type hinting (via *warn-on-reflection*) etc are all good for speed increases. The Joy of Clojure has a great section on performance tuning, which you should read.

    0 讨论(0)
提交回复
热议问题