It is a google interview question and I find most answers online using HashMap or similar data structure. I am trying to find a solution using Trie if possible. Anybody could gi
I think the above answers missed the key point. We have a space with 27 dimensions, the first one is the length and the others the coordinates of each letter. In that space we have points, which are words. The first coordinate of a word is his length. The other coordinates are, for each letter-dimension is the number of occurrences of that letter in that word. For example the words abacus, deltoid, gaff, giraffe, microphone, reef, qar, abcdefghijklmnopqrstuvwxyz have coordinates
[3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[6, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]
[7, 0, 0, 0, 2, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[4, 1, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[7, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[10, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[4, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[26, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The good structure for a set with coordinates is a R-tree or a R*-Tree. Given your collection [x0, x1, ..., x26], you have to ask all the words that contains at most xi letter, for each letter. Your search is in O(log N), where N is the number of words in your dictionary. However you don't want to look at the biggest word in all the words that match your query. This is why the first dimension is important.
You know that the length of the biggest word is between 0 and X, where X=sum(x_i, i=1..26). You can search iteratively from X to 1, but you can also do a binary search algorithm for the length of the query. You use the first dimension of your array as the query. You start from a=X to b=X/2. If their is at least a match, you search from a to (a+b)/2, else you search from b to b-(a-b)/2=(3b-a)/2. You do that until you have b-a=1. You now have the biggest length and all the matches with this length.
This algorithm is asymptotically much more efficient than the algorithms above. The time complexity is in O(ln(N)×ln(X)). The implementation depend on the R-tree library you use.