Finding the shortest repetitive pattern in a string

久未见 提交于 2019-12-01 17:39:18

To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:

result = regexp(str, '^(.+?)(?=\1*$)', 'match');

Some examples:

>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234'

>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234123412341234123'

>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby'

>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby1lullaby2'

I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern.

It loops through the characters of a string called str, trying to match against another string called pattern. If matching fails, the pattern string is extended as needed.

EDIT: I made the code more compact.

str = 'lullabylullabylullaby';

pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;

while sPtr <= length(str)
     if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
            matchingState = true;
            pPtr = pPtr + 1;
            pPtr = mod(pPtr-1,length(pattern)) + 1;
     else                          %// if match fails, extend pattern string and start again
            if matchingState
                sPtr = sPtr - 1;   %// don't change str index when transitioning out of matching state
            end  
            matchingState = false;
            pattern = str(1:sPtr);
            pPtr = 1;
     end

     sPtr = sPtr + 1;

end

display(pattern);

The output is:

pattern =

lullaby

Note:

This doesn't allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';, then

pattern =

lullaby1lullaby2

This also allows the pattern to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; would still result in

pattern =

lullaby1lullaby2

To fix this you could add the lines

if pPtr ~= length(pattern)
    pattern = str;
end
Buck Thorn

Another approach is as follows:

  1. determine length of string, and find all possible factors of the string length value
  2. for each possible factor length, reshape the string and check for a repeated substring

To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple loop, starting with the smallest factor length.

function repeat = repeats_in_string(str);
ns = numel(str);
nf = find(rem(ns, 1:ns) == 0);
for ii=1:numel(nf)
    repeat = str(1:nf(ii));
    if all(ismember(reshape(str,nf(ii),[])',repeat)); 
        break;
    end
end 

This problem is a great Rorschach test for your approach to problem solving. I'll add a signal engineering solution, which should be simple since the signal is expected to be perfectly repetitive, assuming this holds: find the shortest pattern that upon repetition generates the whole string.

In the following str fed to the function is actually a column vector of floats, not a string, the original string having been converted with str2num(str2mat(str)'):

function res=findshortestrepel(str);
[~,ii] = max(fft(str-mean(str)));
res = str(1:round(numel(str)/(ii-1)));

I performed a small test, comparing this to the regexp solution and found it to be faster overall (blue squares), although somewhat inconsistently, and only if you don't consider the time required to convert the string into a vector of floats (green squares). However I did not pursue this further (not breaking records with this):

Times in sec.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!