问题
I'm attempting to compare two arrays of hashes with very similar hash structure (identical and always-present keys) and return the deltas between the two--specifically, I'd like to capture the folllowing:
- Hashes part of
array1
that do not exist inarray2
- Hashes part of
array2
that do not exist inarray1
- Hashes which appear in both data sets
This typically can be achieved by simply doing the following:
deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)
The problem for me (which has turned into a 2-3 hour struggle!) is that I need to identify the deltas based on the values of 3 keys within the hash ('id', 'ref', 'name')--the values of these 3 keys are effectively what makes up a unique entry in my data -- but I must retain the other key/value pairs of the hash (e.g. 'extra'
and numerous other key/value pairs not shown for brevity.
Example Data:
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
Expected Outcome (3 separate array of hashes):
Object containing data in array1
but not in array2
--
[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
Object containing data in array2
but not in array1
--
[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
Object containing data in BOTH array1
and array2
--
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]
I've tried numerous attempts at comparing iterating over the arrays and using Hash#keep_if
based on the 3 keys as well as merging both data sets into a single array and then attempting to de-dup based on array1
but I keep coming up empty handed. Thank you in advance for your time and assistance!
回答1:
This isn't very pretty, but it works. It creates a third array containing all unique values in both array1
and array2
and iterates through that.
Then, since include?
doesn't allow a custom matcher, we can fake it by using detect and looking for an item in the array which has the custom sub-hash matching. We'll wrap that in a custom method so we can just call it passing in array1
or array2
instead of writing it twice.
Finally, we loop through our array3
and determine whether the item
came from array1
, array2
, or both of them and add to the corresponding output array.
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }
# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }
array.detect do |item|
{ 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
end
end
# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []
# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
in_array1 = is_included_in(array1, item)
in_array2 = is_included_in(array2, item)
if in_array1 && in_array2
array1_and_array2.push item
elsif in_array1
array1_only.push item
else
array2_only.push item
end
end
puts array1_only.inspect # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]
回答2:
For this type of problem it's generally easiest to work with indices.
Code
def keepers(array1, array2, keys)
a1 = make_hash(array1, keys)
a2 = make_hash(array2, keys)
common_keys_of_a1_and_a2 = a1.keys & a2.keys
[keeper_idx(array1, a1, common_keys_of_a1_and_a2),
keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end
def make_hash(arr, keys)
arr.each_with_index.with_object({}) do |(g,i),h|
(h[g.values_at(*keys)] ||= []) << i
end
end
def keeper_idx(arr, a, common_keys_of_a1_and_a2)
arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end
Example
array1 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
Notice that the two arrays are slightly different than those given in the question. The question did not specify whether each array could contain two hashes the have the same values for the specified keys. I therefore added a hash to each array to show has that case is dealt with.
keys = ['id', 'ref', 'name']
idx1, idx2 = keepers(array1, array2, keys)
#=> [[1, 4], [2, 3, 4, 5]]
idx1
(idx2
) are the indices of the elements of array1
(array2
) that remain after matches are removed. (array1
and array2
are not modified, however.)
It follows that the two arrays map to
array1.values_at(*idx1)
#=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
and
array2.values_at(*idx2)
#=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
# {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
The indices of the hashes that are removed are given as follows.
array1.size.times.to_a - idx1
#=> [0, 2, 3]
array2.size.times.to_a - idx2
#[0, 1]
Explanation
The steps are as follows.
a1 = make_hash(array1, keys)
#=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
# ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}
a2 = make_hash(array2, keys)
#=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
# ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
# ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
#=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
#=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
#=> [2, 3, 4, 5]· (for array2)
回答3:
See Array#- and Array#&
array1 - array2 #data in array1 but not in array2
array2 - array1 #data in array2 but not in array1
array1 & array2 #data in both array1 and array2
Since you've tagged this question set you can use sets similarly:
require 'set'
set1 = array1.to_set
set2 = array2.to_set
set1 - set2
set2 - set1
set1 & set2
来源:https://stackoverflow.com/questions/45336535/ruby-show-deltas-between-2-array-of-hashes-based-on-subset-of-hash-keys