I've solved a problem that asks you to write a method for determining what words in a supplied array are anagrams and group the anagrams into a sub array within your output.
I've solved it using what seems to be the typical way that you would which is by sorting the words and grouping them into a hash based on their sorted characters.
When I originally started looking for a way to do this I noticed that String#sum
exists which adds the ordinals of each character together.
I'd like to try and work out some way to determine an anagram based on using sum
. For example "cars" and "scar" are anagrams and their sum
is 425.
given an input of %w[cars scar for four creams scream racs]
the expected output (which I already get using the hash solution) is: [[cars, scar, racs],[for],[four],[creams,scream]]
.
It seems like doing something like:
input.each_with_object(Hash.new []) do |word, hash|
hash[word.sum] += [word]
end
is the way to go, that gives you a hash where the values of the key "425" is ['cars','racs','scar']. What I think i'm missing is moving that into the expected format of the output.
Unfortunately I don't think String#sum
is a robust way to solve this problem.
Consider:
"zaa".sum # => 316
"yab".sum # => 316
Same sum, but not anagrams.
Instead, how about grouping them by the sorted order of their characters?
words = %w[cars scar for four creams scream racs]
anagrams = words.group_by { |word| word.chars.sort }.values
# => [["cars", "scar", "racs"], ["for"], ["four"], ["creams", "scream"]]
words = %w[cars scar for four creams scream racs]
res={}
words.each do |word|
key=word.split('').sort.join
res[key] ||= []
res[key] << word
end
p res.values
[["cars", "scar", "racs"], ["for"], ["four"],["creams", "scream"]]
Actually, I think you could use sums for anagram testing, but not summing the chars' ordinals themselves, but something like this instead:
words = %w[cars scar for four creams scream racs]
# get the length of the longest word:
maxlen = words.map(&:length).max
# => 6
words.group_by{|word|
word.bytes.map{|b|
maxlen ** (b-'a'.ord)
}.inject(:+)
}
# => {118486616113189=>["cars", "scar", "racs"], 17005023616608=>["for"], 3673163463679584=>["four"], 118488792896821=>["creams", "scream"]}
Not sure if this is 100% correct, but I think the logic stands.
The idea is to map every word to a N-based number, every digit position representing a different char. N
is the length of the longest word in input set.
To get the desired output format, you just need hash.values
. But note that just using the sum of the character codes in a word could fail on some inputs. It is possible for the sums of the character codes in two words to be the same by chance, when they are not anagrams.
If you used a different algorithm to combine the character codes, the chances of incorrectly identifying words as "anagrams" could be made much lower, but still not zero. Basically you need some kind of hash algorithm, but with the property that the order of the values being hashed doesn't matter. Perhaps map each character to a different random bitstring, and take the sum of the bitstrings for each character in the string?
That way, the chances of any two non-anagrams giving you a false positive would be approximately 2 ** bitstring_length
.
来源:https://stackoverflow.com/questions/9517745/ruby-anagram-using-stringsum