I want to validate a string in C# that contains a Boolean expression with brackets. The string should only contain numbers 1-9, round brackets, \"OR\" , \"AND\".
Exa
If you just want to validate the input string, you can write a simple parser. Each method consumes a certain kind of input (digit, brackets, operator) and returns the remaining string after matching. An exception is thrown if no match can be made.
public class ParseException : Exception { }
public static class ExprValidator
{
public static bool Validate(string str)
{
try
{
string term = Term(str);
string stripTrailing = Whitespace(term);
return stripTrailing.Length == 0;
}
catch(ParseException) { return false; }
}
static string Term(string str)
{
if(str == string.Empty) return str;
char current = str[0];
if(current == '(')
{
string term = LBracket(str);
string rBracket = Term(term);
string temp = Whitespace(rBracket);
return RBracket(temp);
}
else if(Char.IsDigit(current))
{
string rest = Digit(str);
try
{
//possibly match op term
string op = Op(rest);
return Term(op);
}
catch(ParseException) { return rest; }
}
else if(Char.IsWhiteSpace(current))
{
string temp = Whitespace(str);
return Term(temp);
}
else throw new ParseException();
}
static string Op(string str)
{
string t1 = Whitespace_(str);
string op = MatchOp(t1);
return Whitespace_(op);
}
static string MatchOp(string str)
{
if(str.StartsWith("AND")) return str.Substring(3);
else if(str.StartsWith("OR")) return str.Substring(2);
else throw new ParseException();
}
static string LBracket(string str)
{
return MatchChar('(')(str);
}
static string RBracket(string str)
{
return MatchChar(')')(str);
}
static string Digit(string str)
{
return MatchChar(Char.IsDigit)(str);
}
static string Whitespace(string str)
{
if(str == string.Empty) return str;
int i = 0;
while(i < str.Length && Char.IsWhiteSpace(str[i])) { i++; }
return str.Substring(i);
}
//match at least one whitespace character
static string Whitespace_(string str)
{
string stripFirst = MatchChar(Char.IsWhiteSpace)(str);
return Whitespace(stripFirst);
}
static Func<string, string> MatchChar(char c)
{
return MatchChar(chr => chr == c);
}
static Func<string, string> MatchChar(Func<char, bool> pred)
{
return input => {
if(input == string.Empty) throw new ParseException();
else if(pred(input[0])) return input.Substring(1);
else throw new ParseException();
};
}
}
It's probably simpler to do this with a simple parser. But you can do this with .NET regex by using balancing groups and realizing that if the brackets are removed from the string you always have a string matched by a simple expression like ^\d+(?:\s+(?:AND|OR)\s+\d+)*\z
.
So all you have to do is use balancing groups to make sure that the brackets are balanced (and are in the right place in the right form).
Rewriting the expression above a bit:
(?x)^
OPENING
\d+
CLOSING
(?:
\s+(?:AND|OR)\s+
OPENING
\d+
CLOSING
)*
BALANCED
\z
((?x)
makes the regex engine ignore all whitespace and comments in the pattern, so it can be made more readable.)
Where OPENING
matches any number (0 included) of opening brackets:
\s* (?: (?<open> \( ) \s* )*
CLOSING
matches any number of closing brackets also making sure that the balancing group is balanced:
\s* (?: (?<-open> \) ) \s* )*
and BALANCED
performs a balancing check, failing if there are more open brackets then closed:
(?(open)(?!))
Giving the expression:
(?x)^
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
(?:
\s+(?:AND|OR)\s+
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
)*
(?(open)(?!))
\z
If you do not want to allow random spaces remove every \s*
.
See demo at IdeOne. Output:
matched: '2'
matched: '1 AND 2'
matched: '12 OR 234'
matched: '(1) AND (2)'
matched: '(((1)) AND (2))'
matched: '1 AND 2 AND 3'
matched: '1 AND (2 OR (3 AND 4))'
matched: '1 AND (2 OR 3) AND 4'
matched: ' ( 1 AND ( 2 OR ( 3 AND 4 ) )'
matched: '((1 AND 7) OR 6) AND ((2 AND 5) OR (3 AND 4))'
matched: '(1)'
matched: '(((1)))'
failed: '1 2'
failed: '1(2)'
failed: '(1)(2)'
failed: 'AND'
failed: '1 AND'
failed: '(1 AND 2'
failed: '1 AND 2)'
failed: '1 (AND) 2'
failed: '(1 AND 2))'
failed: '(1) AND 2)'
failed: '(1)() AND (2)'
failed: '((1 AND 7) OR 6) AND (2 AND 5) OR (3 AND 4))'
failed: '((1 AND 7) OR 6) AND ((2 AND 5 OR (3 AND 4))'
failed: ''
ANTLR Parser Generator?
a short way of achieving this in C#
Although it may be an overkill if its just numbers and OR + AND
what you want are "balanced groups", with them you can get all bracet definitions, then you just need a simple string parsing
http://blog.stevenlevithan.com/archives/balancing-groups
http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition
If you consider a boolean expression as generated by a formal grammar writing a parser is easier.
I made an open source library to interpret simple boolean expressions. You can take a look at it on GitHub, in particular look at the AstParser
class and Lexer
.
Pretty simply:
At first stage you must determ lexems (digit, bracket or operator) with simple string comparsion.
At second stage you must define variable of count of closed bracket (bracketPairs), which can be calculated by the following algorithm for each lexem:
if current lexem is '(', then bracketPairs++;
if current lexem is ')', then bracketPairs--.
Else do not modify bracketPairs.
At the end if all lexems are known and bracketPairs == 0 then input expression is valid.
The task is a bit more complex, if it's necesery to build AST.