Anagram of String 2 is Substring of String 1

China☆狼群 提交于 2019-12-18 07:12:44

问题


How to find that any anagram of String 1 is sub string of String 2?

Eg :-

String 1 =rove

String 2=stackoverflow

So it will return true as anagram of "rove" is "over" which is sub-string of String 2


回答1:


On edit: my first answer was quadratic in the worst case. I've tweaked it to be strictly linear:

Here is an approach based on the notion of a sliding window: Create a dictionary keyed by the letters of the first dictionary with frequency counts of the letters for the corresponding values. Think of this as a dictionary of targets which need to be matched by m consecutive letters in the second string, where m is the length of the first string.

Start by processing the first m letters in the second string. For each such letter if it appears as a key in the target dictionary decrease the corresponding value by 1. The goal is to drive all target values to 0. Define discrepancy to be the sum of the absolute values of the values after processing the first window of m letters.

Repeatedly do the following: check if discrepancy == 0 and return Trueif it does. Otherwise -- take the character m letters ago and check if it is a target key and if so -- increase the value by 1. In this case, this either increases or decreases the discrepancy by 1, adjust accordingly. Then get the next character of the second string and process it as well. Check if it is a key in the dictionary and if so adjust the value and the discrepancy as appropriate.

Since there are no nested loop and each pass through the main loop involves just a few dictionary lookups, comparisons, addition and subtractions, the overall algorithm is linear.

A Python 3 implementation (which shows the basic logic of how the window slides and the target counts and discrepancy are adjusted):

def subAnagram(s1,s2):
    m = len(s1)
    n = len(s2)
    if m > n: return false
    target = dict.fromkeys(s1,0)
    for c in s1: target[c] += 1

    #process initial window
    for i in range(m):
        c = s2[i]
        if c in target:
            target[c] -= 1
    discrepancy = sum(abs(target[c]) for c in target)

    #repeatedly check then slide:
    for i in range(m,n):
        if discrepancy == 0:
            return True
        else:
            #first process letter from m steps ago from s2
            c = s2[i-m]
            if c in target:
                target[c] += 1
                if target[c] > 0: #just made things worse
                    discrepancy +=1
                else:
                    discrepancy -=1
            #now process new letter:
            c = s2[i]
            if c in target:
                target[c] -= 1
                if target[c] < 0: #just made things worse
                    discrepancy += 1
                else:
                    discrepancy -=1
    #if you get to this stage:
    return discrepancy == 0

Typical output:

>>> subAnagram("rove", "stack overflow")
True
>>> subAnagram("rowe", "stack overflow")
False

To stress-test it, I downloaded the complete text of Moby Dick from Project Gutenberg. This has over 1 million characters. "Formosa" is mentioned in the book, hence an anagram of "moors" appears as a substring of Moby Dick. But, not surprisingly, no anagram of "stackoverflow" appears in Moby Dick:

>>> f = open("moby dick.txt")
>>> md = f.read()
>>> f.close()
>>> len(md)
1235186
>>> subAnagram("moors",md)
True
>>> subAnagram("stackoverflow",md)
False

The last call takes roughly 1 second to process the complete text of Moby Dick and verify that no anagram of "stackoverflow" appears in it.




回答2:


Let L be the length of String1.

Loop over String2 and check if each substring of length L is an anagram of String1.

In your example, String1 = rove and String2 = stackoverflow.

stackoverflow

stac and rove are not anagrams, so move to the next substring of length L.

stackoverflow

tack and rove are not anagrams, and so on till you find the substring.

A faster method would be to check if the last letter in the current substring is present in String1 i.e., once you find that stac and rove are not anagrams, and see that 'c' (which is the last letter of the current substring) is not present in rove, you can simply skip that substring entirely and get the next substring from 'k'.

i.e. stackoverflow

stac and rove are not anagrams. 'c' is not present in 'rove', so simply skip over this substring and check from 'k':

stackoverflow

This will significantly reduce the number of comparisons.


Edit:

Here is a Python 2 implementation of the method explained above.

NOTE: This implementation works under the assumption that all characters in both strings are in lowercase and they consist only of the characters a -z.

def isAnagram(s1, s2):
    c1 = [0] * 26
    c2 = [0] * 26

    # increase character counts for each string
    for i in s1:
        c1[ord(i) - 97] += 1
    for i in s2:
        c2[ord(i) - 97] += 1

    # if the character counts are same, they are anagrams
    if c1 == c2:
        return True
    return False

def isSubAnagram(s1, s2):
    l = len(s1)

    # s2[start:end] represents the substring in s2
    start = 0
    end = l

    while(end <= len(s2)):
        sub = s2[start:end]
        if isAnagram(s1, sub):
            return True
        elif sub[-1] not in s1:
            start += l
            end += l
        else:
            start += 1
            end += 1
    return False

Output:

>>> print isSubAnagram('rove', 'stackoverflow')
True

>>> print isSubAnagram('rowe', 'stackoverflow')
False



回答3:


It can be done in O(n^3) pre-processing, and O(klogk) per query where: n is the size of the "given string" (string 2 in your example) and k is the size of the query (string 1 in your example).

Pre process:

For each substring s of string2: //O(n^2) of those
    sort s 
    store s in some data base (hash table, for example)

Query:

given a query q:
    sort q
    check if q is in the data base
    if it is - it's an anagram of some substring
    otherwise - it is not.

This answer assumes you are going to check multiple "queries" (string 1's) for a single string (string 2), and thus tries to optimize the complexity for each query.


As discussed in comments, you can do the pro-process step lazily - that means, when you first encounter a query of length k insert to the DS all substrings of length k, and proceed as original suggestion.




回答4:


You may need to create all the possible combination of String1 that is rove like rove,rvoe,reov.. Then check this any of this combination is in String2.




回答5:


//Two string are considered and check whether Anagram of the second     string is 
//present in the first string as part of it (Substring)
//e.g. 'atctv' 'cat' will return true as 'atc' is anagram of cat
//Similarly 'battex' is containing an anagram of 'text' as 'ttex'

public class SubstringIsAnagramOfSecondString {

    public static boolean isAnagram(String str1, String str2){
        //System.out.println(str1+"::" + str2);
        Character[] charArr = new Character[str1.length()];

        for(int i = 0; i < str1.length(); i++){
            char ithChar1 = str1.charAt(i);
            charArr[i] = ithChar1;
        }
        for(int i = 0; i < str2.length(); i++){
            char ithChar2 = str2.charAt(i);
            for(int j = 0; j<charArr.length; j++){
                if(charArr[j] == null) continue;
                if(charArr[j] == ithChar2){
                    charArr[j] = null;
                }
            }
        }
        for(int j = 0; j<charArr.length; j++){
            if(charArr[j] != null)
                return false;
        }
        return true;
    }

    public static boolean isSubStringAnagram(String firstStr, String secondStr){
        int secondLength =  secondStr.length();
        int firstLength =  firstStr.length();
        if(secondLength == 0) return true;
        if(firstLength < secondLength || firstLength == 0) return false;
        //System.out.println("firstLength:"+ firstLength +" secondLength:" + secondLength+ 
                //" firstLength - secondLength:" + (firstLength - secondLength));

        for(int i = 0; i < firstLength - secondLength +1; i++){
            if(isAnagram(firstStr.substring(i, i+secondLength),secondStr )){
                return true;
            }
        }
        return false;

    }
    public static void main(String[] args) {
        System.out.println("isSubStringAnagram(xyteabc,ate): "+ isSubStringAnagram("xyteabc","ate"));

    }

}


来源:https://stackoverflow.com/questions/32069724/anagram-of-string-2-is-substring-of-string-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!