Algorithm to find the minimum length of substring having all characters of other string

前端 未结 4 926
我寻月下人不归
我寻月下人不归 2021-02-04 18:15

I have a two strings:
string1 - hello how are you,
String2 - olo (including space character)

Output: lo ho ( hello

相关标签:
4条回答
  • 2021-02-04 18:53

    I would calculate positions of characters from string2 within string1 and then pick the permutation with minimum distance between lowest and highest character position:

    #          positions are:
    #          01234567890123456
    string1 = 'hello how are you'
    string2 = 'olo'
    
    # get string1 positions for each character from set(string2)
    positions = {'o': [4, 7, 15],
                 'l': [2, 3]}
    
    # get all permutations of positions (don't repeat the same element)
    # then pick the permutation with minimum distance between min and max position
    # (obviously, this part can be optimized, this is just an illustration)
    permutations = positions['o'] * positions['l'] * positions['o']
    permutations = [[4,2,7], [4,3,7], [4,2,15], ...]
    the_permutation = [4,3,7]
    
    # voilà
    output = string1_without_spaces[3:7]
    
    0 讨论(0)
  • 2021-02-04 19:08

    There is this algorithm which does this in O(N).

    Idea: Have 2 arrays, viz. isRequired[256] and isFound[256] which tells the frequency of each character in S and while parsing the string S, the frequency of each character that has found yet. Also, keep a counter which tells when a valid window is found. Once a valid window is found, we can shift the window (towards right) maintaining the given invariant of the question.

    Program in C++:

    void findMinWindow(const char *text, const char *pattern, int &start, int &end){
            //Calcuate lengths of text and pattern
            int textLen = strlen(text);
            int patternLen = strlen(pattern);
    
            // Declare 2 arrays which keep tab of required & found frequency of each char in pattern
            int isRequired[256] ; //Assuming the character set is in ASCII
            int isFound[256];
            int count = 0; //For ascertaining whether a valid window is found
    
            // Keep a tab of minimum window 
            int minimumWindow = INT_MAX;
    
            //Prepare the isRequired[] array by parsing the pattern
            for(int i=0;i<patternLen;i++){
                isRequired[pattern[i]]++;
            }
    
            //Let's start parsing the text now
            // Have 2 pointers: i and j - both starting at 0
            int i=0;
            int j=0;
            //Keep moving j forward, keep i fixed till we get a valid window
            for(c=j;c<textLen;c++){
               //Check if the character read appears in pattern or not
               if(isRequired[text[c]] == 0){
                  //This character does not appear in the pattern; skip this
                  continue;
               }
               //We have this character in the pattern, lets increment isFound for this char
               isFound[text[c]]++;
    
               //increment the count if this character satisfies the invariant
               if(isFound[text[c]] <= isRequired[text[c]]){
                  count++;
               }
    
               //Did we find a valid window yet?
               if(count == patternLen){
                  //A valid window is found..lets see if we can do better from here on
                  //better means: increasing i to reduce window length while maintaining invariant
                  while(isRequired[s[i]] == 0 || isFound[s[i]] > isRequired[s[i]]){
                       //Either of the above 2 conditions means we should increment i; however we 
                       // must decrease isFound for this char as well.
                       //Hence do a check again
                       if(isFound[s[i]] > isRequired[s[i]]){
                          isFound[s[i]]--;
                       }
                       i++;
                  }
    
                   // Note that after the while loop, the invariant is still maintained
                   // Lets check if we did better
                   int winLength = j-i+1;
                   if(winLength < minimumWindow){
                      //update the references we got
                      begin = i;
                      end = j;
                      //Update new minimum window lenght
                      minimumWindow = winLength;
                   }
              } //End of if(count == patternLen)
         } //End of for loop 
    }
    
    0 讨论(0)
  • 2021-02-04 19:12

    Keep two pointer l and r, and a hash table M = character -> count for characters in string2 that do not occur in s[l..r].

    Initially set l = 0 and r so that string1[l..r] contains all the characters of string2 (if possible). You do that by removing characters from M until it is empty.

    Then proceed by incrementing r by one in each step and then incrementing l as much as possible while still keeping M empty. The minimum over all r - l + 1 (the length of the substring s[l..r]) is the solution.

    Pythonish pseudocode:

    n = len(string1)
    M = {}   # let's say M is empty if it contains no positive values
    for c in string2:
        M[c]++
    l = 0
    r = -1
    while r + 1 < n and M not empty:
        r++
        M[string1[r]]--
    if M not empty: 
        return "no solution"
    answer_l, answer_r = l, r
    while True:
        while M[string1[l]] < 0:
            M[string1[l]]++
            l++
        if r - l + 1 < answer_r - anwer_l + 1:
            answer_l, answer_r = l, r
        r++
        if r == n:
            break
        M[string1[r]]--
    return s[answer_l..answer_r]
    

    The "is empty" checks can be implemented in O(1) if you maintain the number of positive entries when performing the increment and decrement operations.

    Let n be the length of string1 and m be the length of string2.

    Note that l and r are only ever incremented, so there are at most O(n) increments and thus at most O(n) instructions are executed in the last outer loop.

    If M is implemented as an array (I assume the alphabet is constant size), you get runtime O(n + m), which is optimal. If the alphabet is too large, you can use a hash table to get expected O(n + m).

    Example execution:

    string1 = "abbabcdbcb"
    string2 = "cbb"
    
    # after first loop
    M = { 'a': 0, 'b': 2, 'c': 1, 'd': 0 }
    
    # after second loop
    l = 0
    r = 5
    M = { 'a': -2, 'b': -1, 'c': 0, 'd': 0 }
    
    # increment l as much as possible:
    l = 2
    r = 5
    M = { 'a': -1, 'b': 0, 'c': 0, 'd': 0 }
    
    # increment r by one and then l as much as possible
    l = 2
    r = 6
    M = { 'a': -1, 'b': 0, 'c': 0, 'd': -1 }
    
    # increment r by one and then l as much as possible
    l = 4
    r = 7
    M = { 'a': 0, 'b': 0, 'c': 0, 'd': -1 }
    
    # increment r by one and then l as much as possible
    l = 4
    r = 8
    M = { 'a': 0, 'b': 0, 'c': -1, 'd': -1 }
    
    # increment r by one and then l as much as possible
    l = 7
    r = 9
    M = { 'a': 0, 'b': 0, 'c': 0, 'd': 0 }
    

    The best solution is s[7..9].

    0 讨论(0)
  • 2021-02-04 19:15

    This is a example of implementation with JavaScript. The logic is similar as @Aprillion wrote above.

    DEMO : http://jsfiddle.net/ZB6vm/4/

    var s1 = "hello how are you";
    var s2 = "olo";
    var left, right;
    var min_distance;
    var answer = "";
    
    // make permutation recursively
    function permutate(ar, arrs, k) {
        // check if the end of recursive call
        if (k == arrs.length) {
            var r = Math.max.apply(Math, ar);
            var l = Math.min.apply(Math, ar);
            var dist = r - l + 1;
            if (dist <= min_distance) {
                min_distance = dist;
                left = l;
                right = r;
            }
            return;
        }
        for (var i in arrs[k]) {
            var v = arrs[k][i];
            if ($.inArray(v, ar) < 0) {
                var ar2 = ar.slice();
                ar2.push(v);
                 // recursive call
                permutate(ar2, arrs, k + 1);
            }
        }
    }
    
    function solve() {
        var ar = [];   // 1-demension array to store character position
        var arrs = []; // 2-demension array to store character position
        for (var i = 0; i < s2.length; i++) {
            arrs[i] = [];
            var c = s2.charAt(i);
            for (var k = 0; k < s1.length; k++) { // loop by s1
                if (s1.charAt(k) == c) {
                    if ($.inArray(k, arrs[i]) < 0) {
                        arrs[i].push(k); // save position found
                    }
                }
            }
        }
        // call permutate
        permutate(ar, arrs, 0);
        answer = s1.substring(left, right + 1);
        alert(answer);
    }
    
    solve();
    

    Hope this helps.

    0 讨论(0)
提交回复
热议问题