In clojure[script], how to return nearest elements between 2 sorted vectors

问题

In clojure[script], how to write a function nearest that receives two sorted vectors a, b and returns for each element of a the nearest element of b?

As an example,

(nearest [1 2 3 101 102 103] [0 100 1000]); [0 0 0 100 100 100]

I would like the solution to be both idiomatic and with good performances: O(n^2) is not acceptable!

回答1:

Using a binary search or a sorted-set incurs a O(n*log m) time complexity where n is (count a) and m (count b).

However leveraging the fact that a and b are sorted the time complexity can be O(max(n, m)).

(defn nearest [a b]
  (if-let [more-b (next b)]
    (let [[x y] b
          m (/ (+ x y) 2)
          [<=m >m] (split-with #(<= % m) a)]
      (lazy-cat (repeat (count <=m) x)
        (nearest >m more-b)))
    (repeat (count a) (first b))))


=> (nearest [1 2 3 101 102 103 501 601] [0 100 1000])
(0 0 0 100 100 100 100 1000)

回答2:

Let n be (count a) and m be (count b). Then, if a and b are both ordered, then this can be done in what I believe ought to be O(n log(log m)) time, in other words, very close to linear in n.

First, let's re-implement abs and a binary-search (improvements here) to be independent of host (leveraging a native, e.g. Java's, version ought to be significantly faster)

(defn abs [x] 
  (if (neg? x) (- 0 x) x))

(defn binary-search-nearest-index [v x]
  (if (> (count v) 1)
    (loop [lo 0 hi (dec (count v))]
      (if (== hi (inc lo))
        (min-key #(abs (- x (v %))) lo hi)
        (let [m (quot (+ hi lo) 2)] 
          (case (compare (v m) x) 
             1 (recur lo m) 
            -1 (recur m hi)
             0 m))))
    0))

If b is sorted, a binary search in b takes log m steps. So, mapping this over a is a O(n log m) solution, which for the pragmatist is likely good enough.

(defn nearest* [a b]
  (map #(b (binary-search-nearest-index b %)) a))

However, we can also use the fact that a is sorted to divide and conquer a.

(defn nearest [a b]
  (if (< (count a) 3)
    (nearest* a b)
    (let [m (quot (count a) 2)
          i (binary-search-nearest-index b (a m))] 
      (lazy-cat
        (nearest (subvec a 0 m) (subvec b 0 (inc i))) 
        [(b i)]
        (nearest (subvec a (inc m)) (subvec b i))))))

I believe this ought to be O(n log(log m)). We start with the median of a and find nearest in b in log m time. Then we recurse on each half of a with split portions of b. If a m-proportional factor of b are split each time, you have O(n log log m). If only a constant portion is split off then the half of a working on that portion is linear time. If that continues (iterative halves of a work on constant size portions of b) then you have O(n).

回答3:

Inspired by @amalloy, I have found this interesting idea by Chouser and wrote this solution:

(defn abs[x]                         
  (max x (- x)))                     

(defn nearest-of-ss [ss x]           
  (let [greater (first (subseq ss >= x))
        smaller (first (rsubseq ss <= x))]
    (apply min-key #(abs (- % x)) (remove nil? [greater smaller]))))

(defn nearest[a b]
  (map (partial nearest-of-ss (apply sorted-set a)) b))

Remark: It's important to create the sorted-set only once, in order to avoid performance penalty!

来源：https://stackoverflow.com/questions/22205858/in-clojurescript-how-to-return-nearest-elements-between-2-sorted-vectors

标签

algorithm

clojure

clojurescript