I am trying to extract the occurrences of a string and of all the permutations of its characters from a line of text.
For example, I need to extract the string t =
Let me sketch an algorithm to solve the problem. It is not a problem to be solved with regex.
This solution maintains a sliding window, and checks the frequency of characters in the window with that of t
.
Below is the pseudocode of the algorithm:
function searchPermutation(inpStr, t):
// You may want to check t against the regex ^[A-Za-z0-9]+$ here
// Do a frequency counting of character in t
// For example, t = 'aABBCCC'
// Then freq = { 'A': 1, 'B': 2, 'C': 3, 'a': 1 }
freq = frequency(t)
// Create an empty dict
window = {}
// Number of characters in window
count = 0
// List of matches
result = []
for (i = 0; i < inpStr.length; i++):
// If the current character is a character in t
if inpStr[i] in freq:
// Add the character at current position
window[inpStr[i]]++
// If number of character in window is equal to length of t
if count == t.length:
// Remove the character at the end of the window
window[inpStr[i - t.length]]--
// The count is kept the same here
else: // Otherwise, increase the count
count++
// If all frequencies in window is the same as freq
if count == t.length and window == freq:
// Add to the result a match at (i - t.length + 1, i + 1)
// We can retrieve the string later with substring
result.append((i - t.length + 1, i + 1))
// Reset the window and count (prevent overlapping match)
// Remove the 2 line below if you want to include overlapping match
window = {}
count = 0
else: // If current character not in t
// Reset the window and count
window = {}
count = 0
return result
This should solve the general problem for any t
.