I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves.
A simple example should b
I had the same problem using regex with bash scripting. I used a 2-step solution using pipes with grep -o applying
'\[(.*?)\]'
first, then
'\b.*\b'
Obviously not as efficient at the other answers, but an alternative.
[^\[]
Match any character that is not [.
+
Match 1 or more of the anything that is not [
. Creates groups of these matches.
(?=\])
Positive lookahead ]
. Matches a group ending with ]
without including it in the result.
Done.
[^\[]+(?=\])
Proof.
http://regexr.com/3gobr
Similar to the solution proposed by null. But the additional \]
is not required. As an additional note, it appears \
is not required to escape the [
after the ^
. For readability, I would leave it in.
Does not work in the situation in which the delimiters are identical. "more or less"
for example.
Here is how I got without '[' and ']' in C#:
var text = "This is a test string [more or less]";
//Getting only string between '[' and ']'
Regex regex = new Regex(@"\[(.+?)\]");
var matchGroups = regex.Matches(text);
for (int i = 0; i < matchGroups.Count; i++)
{
Console.WriteLine(matchGroups[i].Groups[1]);
}
The output is:
more or less
To remove also the [] use:
\[.+\]
I wanted to find a string between / and #, but # is sometimes optional. Here is the regex I use:
(?<=\/)([^#]+)(?=#*)
Easy done:
(?<=\[)(.*?)(?=\])
Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:
Alternatively you can just capture what's between the square brackets:
\[(.*?)\]
and return the first captured group instead of the entire match.