Ruby - array intersection (with duplicates)

前端 未结 6 802
一生所求
一生所求 2021-01-02 11:23

I have array(1 and 2). How can I get array3 from them?

array1 = [2,2,2,2,3,3,4,5,6,7,8,9]

array2 = [2,2,2,3,4,4,4,4,8,8,0,0,0]

ar         


        
相关标签:
6条回答
  • 2021-01-02 11:49

    I'll try to reach expected result in that way:

    array1, array2 = [array1, array2].sort_by(&:size)
    arr_copy = array2.dup
    
    array1.each.with_object([]) do |el, arr|
        index = arr_copy.find_index(el)
        arr << arr_copy.slice!(index) if index
    end
    # => [2, 2, 2, 3, 4, 8]
    
    0 讨论(0)
  • 2021-01-02 11:51

    This is a fun one; Cary's flat_map solution is particularly clever. Here's an alternative one-liner using regular old map with some assistance from each_with_object:

    array1.each_with_object(array2.dup).map{|v,t| v if (l = t.index v) && t.slice!(l) }.compact
     #=> [2,2,2,3,4,8]
    

    Much of the complexity here involves inline gymnastics used to provide map with sufficient information to complete the task:

     #
     # we want to duplicate array2 since we'll be
     # mutating it to track duplicates       
     #                       \        array1     array2
     #                        \        value     copy  
     #                         \            \   /
    array1.each_with_object(array2.dup).map{|v,t| ... }
     #         |                         /      
     # Enumerator for array1    Iterate over              
     # with a copy of array2    Enumerator with map  
    

    We can use each_with_object to provide an Enumerator for array1 that also gives our method chain access to a copy of array2. Map then can iterate over the each_with_object Enumerator (which references array1), loading each value into local variable v and our array2 copy into local variable t. From there:

     #                map the value IF...
     #               /  it exists in     and we were able to
     #              / our array2 copy    remove it from our copy
     #            /          |              |
    map{|v,t| v if (l = t.index v) && t.slice!(l) }.compact
     #   |  \         \                               |
     # array1 \        \                          dump nils
     # value   array2   \
     #         copy      load index position into temporary variable l
    

    We iterate over each value of array1 and search for whether the value exists within array2 (via t). If it exists, we remove the first occurance of the value from our copy of array2 and map the value to our resultant array.

    Note the t.index(v) check before t.slice!(t.index(v)) is used as short circuit protection in case the value does not exist within t, our copy of array2. We also use an in-line trick of assigning the index value to a local variable l here: (l = t.index v) so we can reference l in the subsequent boolean check: t.slice!(l).

    Finally, because this methodology will map nil values whenever an array1 value does not exist within array2, we compact the result to remove the nils.


    For those curious, here are some benchmark tests of the solutions presented thus far. First, here are the speeds clocked performing the operation 100,000 times on the sample arrays:

    Cary:        1.050000   0.010000   1.060000 (  1.061217)
    Cary+:       1.580000   0.010000   1.590000 (  1.603645)
    Cam:         0.550000   0.010000   0.560000 (  0.552062)
    Mudasobwa:   2.540000   0.050000   2.590000 (  2.610395)
    Sergii:      0.660000   0.000000   0.660000 (  0.665408)
    Sahil:       1.750000   0.010000   1.760000 (  1.769624)
    #Tommy:      0.290000   0.000000   0.290000 (  0.290114)
    

    If we expand the test arrays to hold 10000 integers with a high degree of intersection...

    array1 = array2 = []
    10000.times{ array1 << rand(10) }
    10000.times{ array2 << rand(10) }
    

    and loop 100 times, the simple loop solution (Sahil) begins to distinguish itself. Cary's solution also holds up well, especially with preprocessing:

                     user     system      total        real
    Cary:        1.590000   0.020000   1.610000 (  1.615798)
    Cary+:       0.870000   0.010000   0.880000 (  0.879331)
    Cam:         6.680000   0.090000   6.770000 (  6.838829)
    Mudasobwa:   6.740000   0.080000   6.820000 (  6.898394)
    Sergii:      6.760000   0.100000   6.860000 (  6.962025)
    Sahil:       0.740000   0.030000   0.770000 (  0.785975)
    #Tommy:      0.430000   0.010000   0.440000 (  0.446482)
    

    For arrays 1/10th the size with 1000 integers and a low degree of intersection, however...

    array1 = array2 = []
    1000.times{ array1 << rand(10000) }
    1000.times{ array2 << rand(10000) } 
    

    when we loop 10 times, the flat_map solution gets flattened... except if we use preprocessing (Cary+):

                     user     system      total        real
    Cary:      135.400000   0.700000 136.100000 (137.123393)
    Cary+:       0.270000   0.010000   0.280000 (  0.268255)
    Cam:         0.670000   0.000000   0.670000 (  0.676438)
    Mudasobwa:   0.670000   0.010000   0.680000 (  0.684088)
    Sergii:      0.660000   0.010000   0.670000 (  0.673881)
    Sahil:       1.970000   2.130000   4.100000 (  4.121759)
    #Tommy:      0.050000   0.000000   0.050000 (  0.045970)
    

    Here's a gist with the benchmarks: https://gist.github.com/camerican/139463b4bd9e0fd89424377931042ce4

    0 讨论(0)
  • 2021-01-02 11:56
    array1 = [2,2,2,2,3,3,4,5,6,7,8,9]
    array2 = [2,2,2,3,4,4,4,4,8,8,0,0,0]
    
    a1, a2 = array1.dup, array2.dup # we’ll mutate them
    
    loop.with_object([]) do |_, memo|
      break memo if a1.empty? || a2.empty?
      e = a2.delete_at(a2.index(a1.shift)) rescue nil
      memo << e if e
    end
    #⇒ [2,2,2,3,4,8]
    
    0 讨论(0)
  • 2021-01-02 11:59
        array1 = [2,2,2,2,3,3,4,5,6,7,8,9]
        array2 = [2,2,2,3,4,4,4,4,8,8,0,0,0]
    

    Getting the frequency of each element in the sample arrays:

        a1_freq=Hash.new(0); a2_freq=Hash.new(0); dup_items=[];
        array1.each {|a| a1_freq[a]+=1 }
        array2.each {|b| a2_freq[b]+=1 }
    

    Finally compare the elements if they are present in the other array or not. If yes, then take minimum count of the common element found in both sample arrays.

        a1_freq.each {|k,v| a2_freq[k] ? dup_items+=[k]*[v,a2_freq[k]].min : nil}
        #dup_items=> [2, 2, 2, 3, 4, 8]
    
    0 讨论(0)
  • 2021-01-02 12:03

    This is a bit verbose, but assuming you mean where the values are at the same position:

    def combine(array1, array2)
        longer_array = array1.length > array2.length ? array1 : array2
    
        intersection = []
        count = 0
        longer_array.each do |item|
            if array1 == longer_array
                looped_array = array2
            else
                looped_array = array1
            end
            if item == looped_array[count]
                intersection.push(item)
            end
            count +=1
        end
        print intersection
    end
    
    
    array_1 = [2,2,2,2,3,3,4,5,6,7,8,9]
    array_2 = [2,2,2,3,4,4,4,4,8,8,0,0,0]
    
    
    combine(array_1, array_2)
    

    I just wanted to point out that I have no clue how you got to array 3 because index position 3 on all three arrays differ:

    array_1[3] = 2
    
    array_2[3] = 3
    
    array_3[3] = 3
    
    0 讨论(0)
  • 2021-01-02 12:14
    (array1 & array2).flat_map { |n| [n]*[array1.count(n), array2.count(n)].min }
      #=> [2,2,2,3,4,8]
    

    The steps:

    a = array1 & array2 
      #=> [2, 3, 4, 8]  
    

    The first element of a (2) is passed to the block and assigned to the block variable:

    n = 2
    

    and the block calculation is performed:

    [2]*[array1.count(2), array2.count(2)].min
      #=> [2]*[4,3].min
      #=> [2]*3
      #=> [2,2,2]
    

    so 2 is mapped to [2,2,2]. The calculations are similar for the remaining three elements of a. As I am using flat_map, this returns [2,2,2,3,4,8].

    Do you have trouble remembering how Enumerable#flat_map differs from Enumerable#map? Suppose I had used map rather than flat_map. Then

    a.map { |n| [n]*[array1.count(n), array2.count(n)].min }
      #=> [[2, 2, 2], [3], [4], [8]]
    

    flat_map does nothing more that put a splat in front of each of those arrays:

    [*[2, 2, 2], *[3], *[4], *[8]]
      #=> [2, 2, 2, 3, 4, 8] 
    

    If the arrays array1 and array2 are large and efficiency is a concern, we could do a bit of O(N) pre-processing:

    def cnt(arr)
      arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    end
    
    cnt1 = cnt(array1)
      #=> {2=>4, 3=>2, 4=>1, 5=>1, 6=>1, 7=>1, 8=>1, 9=>1} 
    cnt2 = cnt(array2)
      #=> {2=>3, 3=>1, 4=>4, 8=>2, 0=>3} 
    
    (array1 & array2).flat_map { |n| [n]*[cnt1[n], cnt2[n]].min }
      #=> [2,2,2,3,4,8]
    
    0 讨论(0)
提交回复
热议问题