I\'ve been wrestling with an issue I was hoping to solve with regex.
Let\'s say I have a string that can contain any alphanumeric with the possibility of a substrin
This regex should do the trick:
[ ](?=[^\]]*?(?:\[|$))
Just replace the space that was matched with "".
Basically all it's doing is making sure that the space you are going to remove has a "[" in front of it, but not if it has a "]" before it.
That should work as long as you don't have nested square brackets, e.g.:
a a[b [c c]b]
Because in that case, the space after the first "b" will be removed and it will become:
aa[b[c c]b]
How to do this depends on what should be done with:
a b [ c [ d [ e ] f ] g
That is ambiguous; possible answers are at least:
ab[ c [ d [ e ] f ]g
ab[ c [ d [ e ]f]g
For the first two cases, you can use regexps. For the third case, you'd be much better off with a (small) parser.
For either case one or two, split the string on the first [
. Strip spaces from everything before [
(that's obviously outside of the brackets). Next, look for .*\]
(case 1) or .*?\]
(case 2) and move that over to your output. Repeat until you're out of input.
This works for me:
(\[.+?\])|\s
Then you simply pass in a replacement value of $1 when you call the replace function. The idea is to look for the patterns inside the brackets first and make sure they're untouched. And then every space outside the brackets gets replaced with nothing.
Note that I tested this with Regex Hero (a .NET regex tester), and not in PHP. So I'm not 100% sure this will work for you.
That was an interesting one. Sounded simple at first, then seemed rather difficult. And then the solution I finally arrived at was indeed simple. I was surprised the solution didn't require a lookaround of any sort. And it should be faster than any method that uses a lookaround.
This doesn't sound like something you really want regex for. It's very easy to parse directly by reading through. Pseudo-code:
inside_brackets = false;
for ( i = 0; i < length(str); i++) {
if (str[i] == '[' )
inside_brackets = true;
else if str[i] == ']'
inside_brackets = false;
if ( ! inside_brackets && is_space(str[i]) )
delete(str[i]);
}
Anything involving regex is going to involve a lot of lookbehind stuff, which will be repeated over and over, and it'll be much slower and less comprehensible.
To make this work for nested brackets, simply change inside_brackets
to a counter, starting at zero, incrementing on open brackets, and decrementing on close brackets.
Resurrecting this question because it had a simple solution that wasn't mentioned.
\[[^]]*\](*SKIP)(*F)|\s+
The left side of the alternation matches complete sets of brackets then deliberately fails. The right side matches and captures spaces to Group 1, and we know they are the right spaces because if they were within brackets they would have been failed by the expression on the left.
See the matches in this demo
This means you can just do
$replace = preg_replace("~\[[^]]*\](*SKIP)(*F)|\s+~","",$string);
Reference
The following will match start-of-line or end-of-bracket (which must come before any space you want to match) followed by anything that isn't start-of-bracket or a space, followed by some space.
/((^|\])[^ \[]*) +/
replacing "all" with $1
will remove the first block of spaces from each non-bracketed sequence. You will have to repeat the match to remove all spaces.
Example:
abcd efg [hij klm]nop qrst u
abcdefg [hij klm]nopqrst u
abcdefg[hij klm]nopqrstu
done