I have a string that may contain an arbitrary number of single-letters separated by spaces. I am looking for a regex (in Perl) that will remove spaces between all (unknown n
This should do the trick:
my $str = ...;
$str =~ s/ \b(\w) \s+ (\w)\b /$1$2/gx;
That removes the space between all single nonspace characters. Feel free to replace \S
with a more restrictive character class if needed. There also may be some edge cases related to punctuation characters that you need to deal with, but I can't guess that from the info you have provided.
As Ether helpfully points out, this fails on one case. Here is a version that should work (though not quite as clean as the first):
s/ \b(\w) ( (?:\s+ \w\b)+ ) /$1 . join '', split m|\s+|, $2/gex;
I liked Ether's test based approach (imitation is the sincerest form of flattery and all):
use warnings;
use strict;
use Test::Magic tests => 4;
sub clean {
(my $x = shift) =~ s{\b(\w) ((?: \s+ (\w)\b)+)}
{$1 . join '', split m|\s+|, $2}gex;
$x
}
test 'space removal',
is clean('ab c d') eq 'ab cd',
is clean('a bcd e f gh') eq 'a bcd ef gh',
is clean('a b c') eq 'abc',
is clean('abc d') eq 'abc d';
returns:
1..4
ok 1 - space removal 1
ok 2 - space removal 2
ok 3 - space removal 3
ok 4 - space removal 4
You can do this with lookahead and lookbehind assertions, as described in perldoc perlre:
use strict;
use warnings;
use Test::More;
is(tran('ab c d'), 'ab cd');
is(tran('a bcd e f gh'), 'a bcd ef gh');
is(tran('a b c'), 'abc');
is(tran('abc d'), 'abc d');
sub tran
{
my $input = shift;
(my $output = $input) =~ s/(?<![[:lower:]])([[:lower:]]) (?=[[:lower:]])/$1/g;
return $output;
}
done_testing;
Note the current code fails on the second test case, as the output is:
ok 1
not ok 2
# Failed test at test.pl line 7.
# got: 'abcd efgh'
# expected: 'a bcd ef gh'
ok 3
ok 4
1..4
# Looks like you failed 1 test of 4.
I left it like this as your second and third examples seem to contradict each other as to how leading single characters should be handled. However, this framework should be enough to allow you to experiment with different lookaheads and lookbehinds to get the exact results you are looking for.