I've got two long streams of numbers coming from two different sources (binary data) in Ruby (1.9.2).
The two sources are encapsulated in the form of two Enumerators.
I want to check that the two streams are exactly equal.
I've come with a couple solutions, but both seem quite inelegant.
The first one simply transforms both into an array:
def equal_streams?(s1, s2)
s1.to_a == s2.to_a
end
This works, but it is not very performant, memory-wise, specially if the streams have lots of information.
The other option is... ugh.
def equal_streams?(s1, s2)
s1.each do |e1|
begin
e2 = s2.next
return false unless e1 == e2 # Different element found
rescue StopIteration
return false # s2 has run out of items before s1
end
end
begin
s2.next
rescue StopIteration
# s1 and s2 have run out of elements at the same time; they are equal
return true
end
return false
end
So, is there a simpler, more elegant way of doing this?
Just a slight refactoring to your code, assuming that your streams do not include an element :eof
.
def equal_streams?(s1, s2)
loop do
e1 = s1.next rescue :eof
e2 = s2.next rescue :eof
return false unless e1 == e2
return true if e1 == :eof
end
end
Using a keyword like loop
should be faster than using a method like each
.
Comparing them one element at a time is probably the best you're going to be able to do but you can do it nicer than your "ugh" solution:
def grab_next(h, k, s)
h[k] = s.next
rescue StopIteration
end
def equal_streams?(s1, s2)
loop do
vals = { }
grab_next(vals, :s1, s1)
grab_next(vals, :s2, s2)
return true if(vals.keys.length == 0) # Both of them ran out.
return false if(vals.keys.length == 1) # One of them ran out early.
return false if(vals[:s1] != vals[:s2]) # Found a mismatch.
end
end
The tricky part is differentiating between just one stream running out and both running out. Pushing the StopIteration
exception into a separate function and using the absence of a key in a hash is a fairly convenient way of doing that. Just checking vals[:s1]
will cause problems if your stream contains false
or nil
but checking for the presence of a key solves that problem.
Here's a shot of doing it by creating an alternative for Enumerable#zip
, which works lazily and doesn't create an entire array. It's combining my implementation of Closure's interleave
and other two answers here (using sentinel value to indicate end of the Enumerable
has been reached - the fact causing the problem is that next
rewinds the Enumerable
once it reached the end).
This solution supports multiple parameters, so you can compare n structures at once.
module Enumerable
# this should be just a unique sentinel value (any ideas for more elegant solution?)
END_REACHED = Object.new
def lazy_zip *others
sources = ([self] + others).map(&:to_enum)
Enumerator.new do |yielder|
loop do
sources, values = sources.map{|s|
[s, s.next] rescue [nil, END_REACHED]
}.transpose
raise StopIteration if values.all?{|v| v == END_REACHED}
yielder.yield values.map{|v| v == END_REACHED ? nil : v}
end
end
end
end
So, when you have variant of zip
which works lazily and doesn't stop iteration when the first enumerable reaches the end, you can use all?
or any?
to actually check corresponding elements for equality.
# zip would fail here, as it would return just [[1,1],[2,2],[3,3]]:
p [1,2,3].lazy_zip([1,2,3,4]).all?{|l,r| l == r}
#=> false
# this is ok
p [1,2,3,4].lazy_zip([1,2,3,4]).all?{|l,r| l == r}
#=> true
# comparing more than two input streams:
p [1,2,3,4].lazy_zip([1,2,3,4],[1,2,3]).all?{|vals|
# check for equality by checking length of the uniqued array
vals.uniq.length == 1
}
#=> false
Following the discussion in the comments, here is zip-based solution, first wrapping block version of zip
within an Enumerator
, then using it to compare corresponding elements.
It works, but there is already mentioned edge case: if the first stream is shorter than the other, remaining elements from the other will be discarded (see the example below).
I have marked this answer as community wiki as other members could improve on it.
def zip_lazy *enums
Enumerator.new do |yielder|
head, *tail = enums
head.zip(*tail) do |values|
yielder.yield values
end
end
end
p zip_lazy(1..3, 1..4).all?{|l,r| l == r}
#=> true
p zip_lazy(1..3, 1..3).all?{|l,r| l == r}
#=> true
p zip_lazy(1..4, 1..3).all?{|l,r| l == r}
#=> false
Here is a 2-source example using a fiber/co-routine. It's a bit long-winded, but it's very explicit about its behavior, which is nice.
def zip_verbose(enum1, enum2)
e2_fiber = Fiber.new do
enum2.each{|e2| Fiber.yield true, e2 }
Fiber.yield false, nil
end
e2_has_value, e2_val = true, nil
enum1.each do |e1_val|
e2_has_value, e2_val = e2_fiber.resume if e2_has_value
yield [true, e1_val], [e2_has_value, e2_val]
end
return unless e2_has_value
loop do
e2_has_value, e2_val = e2_fiber.resume
break unless e2_has_value
yield [false, nil], [e2_has_value, e2_val]
end
end
def zip(enum1, enum2)
zip_verbose(enum1, enum2) {|e1, e2| yield e1[1], e2[1] }
end
def self.equal?(enum1, enum2)
zip_verbose(enum1, enum2) do |e1,e2|
return false unless e1 == e2
end
return true
end
来源:https://stackoverflow.com/questions/6486107/ruby-compare-two-enumerators-elegantly