Words with at least 2 common letters [closed]

99封情书 提交于 2020-01-05 08:14:31

问题


A string is named 2-consistent if each word has at least 2 letters in common with the next word.


For example

"Atom another era" [atom has a and t in common with another and another has e and a in common with era (the answer is not unique).

First of all I need a data structure which takes 2 words and answers in constant time at the question "Do these words have at least 2 letters in common?"

Now, given a string of n words I need to find the longest 2-consistent substring.

I can't figure out what data structure to use. I thought to radix tree or prefix tree, but I could not find the answer. Can you help me?


回答1:


Assuming unaccented letters and ignoring capitalization, for each word you can store a bit-field in a 32-bit integer where bits 0-25 are set to 1 if the corresponding letter from a-z is present.

The integer can be computed in linear time like this:

int getBitField(char* word)
{
    int bits = 0;
    while(*word)
        bits |= 1 << ((*word++) - 'a');
    return bits;
}

If the words are assumed to be words in English or some other language, with a maximum word length then the difference between constant and linear time is fairly meaningless because (for the sake of argument) all words less than the maximum length can be padded out with non-matching characters, which will result in a constant time algorithm.

Once you have the bit fields for two words you can test if they are 2-consistent in constant time by ANDing them together and checking if the result is not zero (which would indicate no letters in common) and not a power of 2 (which would indicate only one letter in common as only a single bit is set). You can test for a power of 2 by ANDing a number with itself minus 1.

bool is2Consistent(int word1bits, int word2bits)
{
    int common = word1bits & word2bits;
    return (common & (common - 1)) != 0;
}

This won't work if you intend to define words like 'meet' and 'beef' which have repeated letters as 2-consistent.

If you wanted to test for 3-consistency, you just need to add an extra line to the function:

bool is3Consistent(int word1bits, int word2bits)
{
    int common = word1bits & word2bits;
    common &= (common - 1);
    return (common & (common - 1)) != 0;
}

ANDing an integer with itself minus one just removes the least significant bit, so you could apply it an arbitrary number of times to test for 4-consistency, 5-consistency etc.




回答2:


Part 1: Are wordOne and wordTwo 2-consistent ?

public bool IsWordsTwoConsistent(string first, string second)
{
    int[] letters = Enumerable.Repeat(0, 26).ToArray();
    int countDoubles = 0;

    foreach (char c in first.toLowerCase())
    {
        letters[(int)c - 97]++;
    }

    foreach (char c in second.toLowerCase())
    {
        if (letters[(int)c - 97] > 0)
            countDoubles++;

        if (countDoubles > 1)
            return true;
    }

    return false;
}

Part 2: Longest 2-consistent substring

public int GetPositionLongestTwoConsistentSubstring(string input)
{
    string[] wordsArray = input.Split(' ');
    int maxLocation = -1, maxLength = 0;
    int candLocation = -1, candLength = 0;  //candiadate

    for (int i = 0 ; i < wordsArray.Length - 1 ; i++)
    {
        if (IsWordsTwoConsistent(wordsArray[i], wordsArray[i+1]))
        {
            candLength++;
            if (candLocation == -1)
                candLength = i;
        }
        else
        {
            if (candLength > maxLength)
            {
                maxLength = candLength;
                maxLocation = candLocation;
            }           
            candLength = 0;
            candLocation = -1;
        }
    }

    if (candLength > maxLength)
    {
        maxLength = candLength;
        maxLocation = candLocation;
    }

    return maxLocation;
}



回答3:


First of all I need a data structure which takes 2 words and answers in constant time at the question "Do these words have at least 2 letters in common?"

Easy. First compute the adjacency matrix for the dictionary you are using where 'adjacent' is defined to mean 'having at least two letters in common'. I disagree with the comments above, storing even a comprehensive English dictionary isn't very much data these days. Storing the full adjacency matrix might take too much space for your liking, so use sparse array facilities.

Now, bear in mind that an English word is just a number in base-26 (or base-52 if you insist on distinguishing capital letters) so looking up the row and column for a pair of words is a constant-time operation and you have the solution to your question.

Oh sure, this consumes space and takes a fair amount of pre-computation but OP asks about a data structure for answering the question in constant time.



来源:https://stackoverflow.com/questions/31445651/words-with-at-least-2-common-letters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!