clustering words based on their char set
问题 Say there is a word set and I would like to clustering them based on their char bag (multiset). For example {tea, eat, abba, aabb, hello} will be clustered into {{tea, eat}, {abba, aabb}, {hello}}. abba and aabb are clustered together because they have the same char bag, i.e. two a and two b . To make it efficient, a naive way I can think of is to covert each word into a char-cnt series, for exmaple, abba and aabb will be both converted to a2b2 , tea/eat will be converted to a1e1t1 . So that