Search for permutation of characters of a substring in Python

前端 未结 2 2018
刺人心
刺人心 2021-01-21 02:05

I am trying to extract the occurrences of a string and of all the permutations of its characters from a line of text.

For example, I need to extract the string t =

2条回答
  •  逝去的感伤
    2021-01-21 02:18

    Let me sketch an algorithm to solve the problem. It is not a problem to be solved with regex.

    This solution maintains a sliding window, and checks the frequency of characters in the window with that of t.

    Below is the pseudocode of the algorithm:

    function searchPermutation(inpStr, t):
        // You may want to check t against the regex ^[A-Za-z0-9]+$ here
    
        // Do a frequency counting of character in t
        // For example, t = 'aABBCCC'
        // Then freq = { 'A': 1, 'B': 2, 'C': 3, 'a': 1 }
        freq = frequency(t)
    
        // Create an empty dict
        window = {}
        // Number of characters in window
        count = 0
        // List of matches
        result = []
    
        for (i = 0; i < inpStr.length; i++):
            // If the current character is a character in t
            if inpStr[i] in freq:
                // Add the character at current position
                window[inpStr[i]]++
    
                // If number of character in window is equal to length of t
                if count == t.length:
                    // Remove the character at the end of the window
                    window[inpStr[i - t.length]]--
                    // The count is kept the same here
                else: // Otherwise, increase the count
                    count++
    
                // If all frequencies in window is the same as freq
                if count == t.length and window == freq:
                    // Add to the result a match at (i - t.length + 1, i + 1)
                    // We can retrieve the string later with substring
                    result.append((i - t.length + 1, i + 1))
    
                    // Reset the window and count (prevent overlapping match)
                    // Remove the 2 line below if you want to include overlapping match
                    window = {}
                    count = 0
            else: // If current character not in t
                // Reset the window and count
                window = {}
                count = 0
    
        return result
    

    This should solve the general problem for any t.

提交回复
热议问题