Comparing two arrays containing strings for anagrams in Ruby

廉价感情. 提交于 2019-12-04 16:10:09

[Edit: I've edited my answer to incorporate an efficiency improvement suggested by @raph in a comment on the question (the method anagram? below). That may not be necessary, but I thought it was such a good idea that it should get some exposure. I've also given a detailed explanation, as the OP is new to Ruby, as might be other readers.]

You might consider doing it as follows.

Code

def anagrams(a, b)
  return nil unless a.size == b.size
  a.zip(b).map { |aw,bw| anagram?(aw,bw) ? 1 : 0 }
end

def anagram?(aw, bw)
  return false unless aw.size == bw.size
  counts = aw.downcase.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
  bw.downcase.each_char do |c|
    return false unless counts[c] > 0
    counts[c] -= 1
  end
  true
end

Example

a = ['hello', 'goodbye', 'pants', 'baa']
b = ['helio', 'godbye', 'Spant', 'aba']
anagrams(a, b)
  #=> [0, 0, 1, 1]

Explanation

anagrams method

For the example above,

a.size #=> 4
b.size #=> 4

so we don't return nil in the first line of anagrams.

Next,

c = a.zip(b)
  #=> [["hello", "helio"], ["goodbye", "godbye"],
  #    ["pants", "Spant"], ["baa", "aba"]]

Assuming for a moment that anagram? works as desired:

c.map { |e| anagram?(e.first, e.last) ? 1 : 0 }
  #=> [0, 0, 1, 1]

Enumerable#map passes each element of c (a two-element array) into the block.1. It is clearer, however, to decompose (or "disambiguate") those arrays and assign each of the two words they comprise to a block variable2:

c.map { |aw,bw| anagram?(aw,bw) ? 1 : 0 }
  #=> [0, 0, 1, 1]

The first element passed in is ["hello", "helio"], so

aw => "hello"
bw #=> "helio"

and we execute

anagram?("hello", "helio") ? 1 : 0
  #=> 0

which is shorthand for

if anagram?("hello", "helio")
  1
else
  0
end
  #=> 0

anagram? method

So now let's move on to anagram?, with

aw = "hello"
bw = "helio"

Since

aw.size == bw.size #=> true

we don't return.

Count frequency of letters in the first word

Let me now write the next few lines of anagram? slightly differently:

counts = Hash.new(0)
  #=> {}
aw_down = aw.downcase 
  #=> "hello"
aw_down.each_char { |c| counts[c] += 1 }
  #=> "hello"
counts
  #=> {"h"=>1, "e"=>1, "l"=>2, "o"=>1}

(The last line is there just to show the value of the hash.)

In the first line we create a hash counts with a default value of zero. All this means is that if counts does not contain the key k, counts[k] will return the default value. Very important: doing so does not change the hash!3

String#each_char4 passes each character of "hello" into the block and assigns it to the block variable c. Initially, c='h' and h={}. We then execute

counts['h'] += 1

which is shorthand for

counts['h'] = counts['h'] + 1

Since counts does not yet have a key 'h', counts['h'] on the right returns the default value:

counts['h'] = 0 + 1 #=> 1
counts #=> {"h"=>1}

Similarly, after 'e' and the first 'l' are passed to the block, we have:

counts #=> {"h"=>1, "e"=>1, "l"=>1} 

However, when we pass the second 'l', we execute

counts['l'] = counts['l'] + 1
  #=>    1 + 1
  #=> 2

and we finish up with

counts #=> {"h"=>1, "e"=>1, "l"=>2, "o"=>1}

The method Enumerable#each_with_object will become a good friend

This method is used merely to save some steps. It allows us to write:

counts = Hash.new(0)
aw_down.each_char { |c| counts[c] += 1 }

as

counts = aw_down.each_with_object(Hash.new(0)) { |c,h| h[c] += 1 }

and we can also get rid of the line

aw_down = aw.downcase 

by writing

counts = aw.downcase.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }

This may seem like a small saving, but there are many other situations where the use of each_with_object and other Enumerable class methods permit the chaining of methods, which is extremely useful.

Decrementing letter counts for letters in the second word

Recall

counts #=> {"h"=>1, "e"=>1, "l"=>2, "o"=>1}

We now execute

bw_down = bw.downcase
  #=> "helio"
"helio".each_char do |c|
  return false unless counts[c] > 0
  counts[c] -= 1
end

First, 'h' is passed into the block. As counts['h'] #=> 1, we execute counts['h'] -= 1, so now

counts #=> {"h"=>0, "e"=>1, "l"=>2, "o"=>1}`.

After passing 'e' and 'l' to the block,

counts #=> {"h"=>0, "e"=>0, "l"=>1, "o"=>1}

but when we pass 'i', we find

counts['i'] #=> 0

(i.e., the default value of zero is returned, and we don't want to set counts['i'] to -1) so we return false, having concluded that the two words are not anagrams. (Had the second word been "heeio", we would have returned false when the second 'e' was passed to the block.)

Do we have an anagram?

Since two two words have the same length, if we are able to process all characters of the second word without returning false, we must end up with

counts #=> {"h"=>0, "e"=>0, "l"=>0, "o"=>0}

(no need to check!), meaning the two words are anagrams, so in this case we would return true to anagrams.5 Hence, the last line of anagram?.

Notes

1 Under the hood, this is what's happening:

enum = c.map
  #=> #<Enumerator: [["hello", "helio"], ["goodbye", "godbye"],
  #                  ["pants", "Spant"], ["baa", "aba"]]:map>

Here we can see what elements the enumerator will pass into the block, but sometimes you need to convert the enumerator to an array to get that information:

enum.to_a
  #=> [["hello", "helio"], ["goodbye", "godbye"],
  #    ["pants", "Spant"], ["baa", "aba"]]

It is actually the method Array#each that passes the elements of enum into the block:

enum.each { |aw,bw| anagram?(aw,bw) ? 1 : 0 }
  #=> [0, 0, 1, 1]

2 If we pass [[1,2],3] into a block, and the block variables are written |(a,b),c|, then a=>1, b=>2, c=>3. This is quite handy. Cool, eh?.

3

h = Hash.new('pig')
h['dog'] = 7 #=> 7
h            #=> {"dog"=>7}
h[0]         #=> "pig"
h['cat']     #=> "pig"
h[{:a=>1}]   #=> "pig"
h            #=> {"dog"=>7}

Note there is a form of Hash#new that takes block, which allows keys not in the hash to be added when they are referenced.

4 Instead of aw_down.each_char we could have written aw_down.chars.each, but aw_down.chars creates an unnecessary intermediate array. each_char, an enumerator, merely passes values as they are required.

5 We could return 0 rather than false and 1 rather than true, in which case we could write

a.zip(b).map { |aw,bw| anagram?(aw,bw) }

in anagrams, but wouldn't it be clearer to have anagrams return an array whose values are true or false, rather than 0 or 1?

This doesn't have to be rocket science. In fact, so long as you can consistently represent each array, it's a no-brainer:

a = ['hello', 'goodbye', 'pants', 'baa']
b = ['helio', 'godbye', 'spant', 'aba']
c = ['lohel', 'goedboy', 'spant', 'aab']

def anagram_flatten(array)
  array.collect do |word|
    word.chars.sort.join
  end
end

puts anagram_flatten(a) == anagram_flatten(b)
# => false
puts anagram_flatten(a) == anagram_flatten(c)
# => true

I wouldn't worry about partial comparisons when a simple array vs. array comparison is super quick anyway.

Cristian Lupascu

Regarding your code, there are only two things that need to be fixed:

  • the line x = y? do should be if x == y
  • instead of array.find(index) you need to use array[index] (concrete examples: a.find(z) and b.find(z) will become a[z] and b[z], respectively)

Here's your code, with these two fixes applied. It works:

a = ['hello', 'goodbye', 'pants', 'baa']
b = ['helio', 'godbye', 'spant', 'aba']

x = a.length
y = b.length
z = 0

if x == y
    while z < x do
        if a[z].chars.sort.join == b[z].chars.sort.join
            puts 1
        else
            puts 0
        end

        z += 1
    end
end

For a more ruby-idiomatic solution, see tadman's answer.

Alright, I found a solution!

for i in 0..a.length-1
  a[i].chars.sort == b[i].chars.sort ? 1 : 0
end

Output

0
0
1
1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!