问题
I was wondering if there was a way to do pattern matching in Octave / matlab? I know Maple 10 has commands to do this but not sure what I need to do in Octave / Matlab. So if a number was 12341234123412341234
the pattern match would be 1234
. I'm trying to find the shortest pattern that upon repetiton generates the whole string.
Please note: the numbers (only numbers will be used) won't be this simple. Also, I won't know the pattern ahead of time (that's what I'm trying to find). Please see the Maple 10 example below which shows that the pattern isn't known ahead of time but the command finds the pattern.
Example of Maple 10 pattern matching:
ns:=convert(12341234123412341234,string);
ns := "12341234123412341234"
StringTools:-PrimitiveRoot(ns);
"1234"
How can I do this in Octave / Matlab? Ps: I'm using Octave 3.8.1
回答1:
To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:
result = regexp(str, '^(.+?)(?=\1*$)', 'match');
Some examples:
>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'1234'
>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'1234123412341234123'
>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'lullaby'
>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'lullaby1lullaby2'
回答2:
I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern
.
It loops through the characters of a string called str
, trying to match against another string called pattern
. If matching fails, the pattern
string is extended as needed.
EDIT: I made the code more compact.
str = 'lullabylullabylullaby';
pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;
while sPtr <= length(str)
if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
matchingState = true;
pPtr = pPtr + 1;
pPtr = mod(pPtr-1,length(pattern)) + 1;
else %// if match fails, extend pattern string and start again
if matchingState
sPtr = sPtr - 1; %// don't change str index when transitioning out of matching state
end
matchingState = false;
pattern = str(1:sPtr);
pPtr = 1;
end
sPtr = sPtr + 1;
end
display(pattern);
The output is:
pattern =
lullaby
Note:
This doesn't allow arbitrary delimiters between occurrences of the pattern
string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';
, then
pattern =
lullaby1lullaby2
This also allows the pattern
to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1';
would still result in
pattern =
lullaby1lullaby2
To fix this you could add the lines
if pPtr ~= length(pattern)
pattern = str;
end
回答3:
Another approach is as follows:
- determine length of string, and find all possible factors of the string length value
- for each possible factor length, reshape the string and check for a repeated substring
To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple loop, starting with the smallest factor length.
function repeat = repeats_in_string(str);
ns = numel(str);
nf = find(rem(ns, 1:ns) == 0);
for ii=1:numel(nf)
repeat = str(1:nf(ii));
if all(ismember(reshape(str,nf(ii),[])',repeat));
break;
end
end
回答4:
This problem is a great Rorschach test for your approach to problem solving. I'll add a signal engineering solution, which should be simple since the signal is expected to be perfectly repetitive, assuming this holds: find the shortest pattern that upon repetition generates the whole string.
In the following str
fed to the function is actually a column vector of floats, not a string, the original string having been converted with str2num(str2mat(str)')
:
function res=findshortestrepel(str);
[~,ii] = max(fft(str-mean(str)));
res = str(1:round(numel(str)/(ii-1)));
I performed a small test, comparing this to the regexp
solution and found it to be faster overall (blue squares), although somewhat inconsistently, and only if you don't consider the time required to convert the string into a vector of floats (green squares). However I did not pursue this further (not breaking records with this):
Times in sec.
来源:https://stackoverflow.com/questions/28963384/finding-the-shortest-repetitive-pattern-in-a-string