I am looking for a regex that will find repeating letters. So any letter twice or more, for example:
booooooot or abbott
I won\'t know the
You can find any letter, then use \1
to find that same letter a second time (or more). If you only need to know the letter, then $1
will contain it. Otherwise you can concatenate the second match onto the first.
my $str = "Foooooobar";
$str =~ /(\w)(\1+)/;
print $1;
# prints 'o'
print $1 . $2;
# prints 'oooooo'
I think you actually want this rather than the "\w" as that includes numbers and the underscore.
([a-zA-Z])\1+
Ok, ok, I can take a hint Leon. Use this for the unicode-world or for posix stuff.
([[:alpha:]])\1+
Just for kicks, a completely different approach:
if ( ($str ^ substr($str,1) ) =~ /\0+/ ) {
print "found ", substr($str, $-[0], $+[0]-$-[0]+1), " at offset ", $-[0];
}
The following code will return all the characters, that repeat two or more times:
my $str = "SSSannnkaaarsss";
print $str =~ /(\w)\1+/g;
/(.)\\1{2,}+/u
'u' modifier matching with unicode
I Think using a backreference would work:
(\w)\1+
\w
is basically [a-zA-Z_0-9]
so if you only want to match letters between A and Z (case insensitively), use [a-zA-Z]
instead.
(EDIT: or, like Tanktalus mentioned in his comment (and as others have answered as well), [[:alpha:]]
, which is locale-sensitive)