Clojure Performance For Expensive Algorithms

后端 未结 2 641
滥情空心
滥情空心 2020-12-29 11:13

I have implemented an algorithm to calculate the longest contiguous common subsequence (not to be confused with longest common subsequence, though not important for

相关标签:
2条回答
  • 2020-12-29 11:28

    EDIT: Added a faster uglier version below the first one.

    Here is my take:

    (defn my-lcs [^objects a1 ^objects a2]
      (first
        (let [n (inc (alength a1))]
          (areduce a1 i 
            [max-len ^ints prev ^ints curr] [0 (int-array n) (int-array n)]
            [(areduce a2 j max-len (unchecked-long max-len)
               (let [match-len 
                     (if (.equals (aget a1 i) (aget a2 j))
                       (unchecked-inc (aget prev j))
                       0)]
                 (aset curr (unchecked-inc j) match-len)
                 (if (> match-len max-len)
                   match-len
                   max-len)))
             curr prev]))))
    

    Main differences with yours: a[gs]et vs a[gs]et-int, use of unchecked- ops (implicitly through areduce), use of a vector as the return value (and "swap" mechanism) and max-len is coerced to primitive before the inner loop (primitive-valued loops are problematic, slightly less since 1.5RC2 but the support isn't perfect yet, however *warn-on-reflection* is not silent).

    And I switched to .equals instead of = to avoid the logic in Clojure's equiv.

    EDIT: let's get ugly and restore the arrays swap trick:

    (deftype F [^:unsynchronized-mutable ^ints curr
                ^:unsynchronized-mutable ^ints prev]
      clojure.lang.IFn
      (invoke [_ a1 a2]
        (let [^objects a1 a1
              ^objects a2 a2]
          (areduce a1 i max-len 0
            (let [m (areduce a2 j max-len (unchecked-long max-len)
                      (let [match-len 
                            (if (.equals (aget a1 i) (aget a2 j))
                              (unchecked-inc (aget prev j))
                              0)]
                        (aset curr (unchecked-inc j) (unchecked-int match-len))
                        (if (> match-len max-len)
                          match-len
                          max-len)))
                  bak curr]
              (set! curr prev)
              (set! prev bak)
              m)))))
    
    (defn my-lcs2 [^objects a1 a2]
      (let [n (inc (alength a1))
            f (F. (int-array n) (int-array n))]
        (f a1 a2)))
    

    On my box, it's 30% faster.

    0 讨论(0)
  • 2020-12-29 11:34

    Here are a couple improvements:

    1. No advantage to fancy type hinting, just use ^objects
    2. aset-int is deprecated I believe -- just plain old aget is faster, by about 3x overall it seems

    Beyond that (and the long type hint on the recur mentioned above), I don't see any obvious ways to improve further.

    (defn lcs
      [^objects a1 ^objects a2]
      (let [a1-len (alength a1)
            a2-len (alength a2)
            prev (int-array (inc a2-len))
            curr (int-array (inc a2-len))]
        (loop [i 0 max-len 0 prev prev curr curr]
          (if (< i a1-len)
            (recur (inc i)
                   (long (loop [j 0 max-len max-len]
                     (if (< j a2-len)
                       (if (= (aget a1 i) (aget a2 j))
                         (let [match-len (inc (aget prev j))]
                           (do
                             (aset curr (inc j) match-len)
                             (recur (inc j) (max max-len match-len))))
                         (do
                           (aset curr (inc j) 0)
                           (recur (inc j) max-len)))
                       max-len)))
                   curr
                   prev)
            max-len))))
    #'user/lcs
    user> (time (lcs a1 a2))
    "Elapsed time: 3862.211 msecs"
    
    0 讨论(0)
提交回复
热议问题