I have an application that needs to allow users to write expressions similar to excel:
(H1 + (D1 / C3)) * I8
and more complex things like
If(H1 = \'True
When faced with a similar situation - the need to handle short one-line expressions - I wrote a parser. The expressions were boolean logic, of the form
n1 = y and n2 > z
n2 != x or (n3 > y and n4 = z)
and so on. In english you could say that there are atoms joined by AND and OR, and each atom has three elements - a left-hand-side attribute, an operator, and a value. Because it was so succint I think the parsing was easier. The set of possible attributes is known and limited (eg: name, size, time). The operators vary by attribute: different attributes take different sets of operators. And the range and format of possible values vary according to attribute as well.
To parse, I split the string on whitespace using String.Split(). I later realized that prior to Split(), I needed to normalize the input string - inserting whitespace before and after parens. I did that with a regex.Replace().
The output of the split is an array of tokens. Then parsing occurs in a big for loop with a switch on the left-hand-side attribute value. With each go-round of the loop, I was set to slurp in a group of tokens. If the first token was an open-paren, then the group was just one token in length: the paren itself. For tokens that were well-known names - my attribute values - the parser had to slurp in a group of 3 tokens, one each for the name, the operator, and the value. If at any point there are not enough tokens, the parser throws an exception. Based on the stream of tokens, the parser state would change. A conjunction (AND,OR,XOR) meant to push the prior atom onto a stack, and when the next atom was finished, I'd pop the prior atom and join those two atoms into a compound atom. And so on. The state management happened at the end of each loop of the parser.
Atom current;
for (int i=0; i < tokens.Length; i++)
{
switch (tokens[i].ToLower())
{
case "name":
if (tokens.Length <= i + 2)
throw new ArgumentException();
Comparison o = (Comparison) EnumUtil.Parse(typeof(Comparison), tokens[i+1]);
current = new NameAtom { Operator = o, Value = tokens[i+2] };
i+=2;
stateStack.Push(ParseState.AtomDone);
break;
case "and":
case "or":
if (tokens.Length <= i + 3)
throw new ArgumentException();
pendingConjunction = (LogicalConjunction)Enum.Parse(typeof(LogicalConjunction), tokens[i].ToUpper());
current = new CompoundAtom { Left = current, Right = null, Conjunction = pendingConjunction };
atomStack.Push(current);
break;
case "(":
state = stateStack.Peek();
if (state != ParseState.Start && state != ParseState.ConjunctionPending && state != ParseState.OpenParen)
throw new ArgumentException();
if (tokens.Length <= i + 4)
throw new ArgumentException();
stateStack.Push(ParseState.OpenParen);
break;
case ")":
state = stateStack.Pop();
if (stateStack.Peek() != ParseState.OpenParen)
throw new ArgumentException();
stateStack.Pop();
stateStack.Push(ParseState.AtomDone);
break;
// more like that...
case "":
// do nothing in the case of whitespace
break;
default:
throw new ArgumentException(tokens[i]);
}
// insert housekeeping for parse states here
}
That's simplified, just a little. But the idea is that each case statement is fairly simple. It's easy to parse in an atomic unit of the expression. The tricky part was joining them all together appropriately.
That trick was accomplished in the housekeeping section, at the end of each slurp-loop, using the state stack and the atom stack. Different stuff can happen according to the parser state. As I said, in each case statement, the parser state might change, with the prior state getting pushed onto a stack. Then at the end of the switch statement, if the state said I had just finished parsing an atom, and there was a pending conjunction, I'd move the just-parsed atom into the CompoundAtom. The code looks like this:
state = stateStack.Peek();
if (state == ParseState.AtomDone)
{
stateStack.Pop();
if (stateStack.Peek() == ParseState.ConjunctionPending)
{
while (stateStack.Peek() == ParseState.ConjunctionPending)
{
var cc = critStack.Pop() as CompoundAtom;
cc.Right = current;
current = cc; // mark the parent as current (walk up the tree)
stateStack.Pop(); // the conjunction is no longer pending
state = stateStack.Pop();
if (state != ParseState.AtomDone)
throw new ArgumentException();
}
}
else stateStack.Push(ParseState.AtomDone);
}
The one other bit of magic was the EnumUtil.Parse. That allows me to parse things like "<" into an enum value. Suppose you define your enums like this:
internal enum Operator
{
[Description(">")] GreaterThan,
[Description(">=")] GreaterThanOrEqualTo,
[Description("<")] LesserThan,
[Description("<=")] LesserThanOrEqualTo,
[Description("=")] EqualTo,
[Description("!=")] NotEqualTo
}
Normally Enum.Parse looks for the symbolic name of the enum value, and < is not a valid symbolic name. EnumUtil.Parse() looks for the thing in the description. The code looks like this:
internal sealed class EnumUtil
{
///
/// Returns the value of the DescriptionAttribute if the specified Enum value has one.
/// If not, returns the ToString() representation of the Enum value.
///
/// The Enum to get the description for
///
internal static string GetDescription(System.Enum value)
{
FieldInfo fi = value.GetType().GetField(value.ToString());
var attributes = (DescriptionAttribute[])fi.GetCustomAttributes(typeof(DescriptionAttribute), false);
if (attributes.Length > 0)
return attributes[0].Description;
else
return value.ToString();
}
///
/// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object.
/// Note: Utilised the DescriptionAttribute for values that use it.
///
/// The System.Type of the enumeration.
/// A string containing the name or value to convert.
///
internal static object Parse(Type enumType, string value)
{
return Parse(enumType, value, false);
}
///
/// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object.
/// A parameter specified whether the operation is case-sensitive.
/// Note: Utilised the DescriptionAttribute for values that use it.
///
/// The System.Type of the enumeration.
/// A string containing the name or value to convert.
/// Whether the operation is case-sensitive or not.
///
internal static object Parse(Type enumType, string stringValue, bool ignoreCase)
{
if (ignoreCase)
stringValue = stringValue.ToLower();
foreach (System.Enum enumVal in System.Enum.GetValues(enumType))
{
string description = GetDescription(enumVal);
if (ignoreCase)
description = description.ToLower();
if (description == stringValue)
return enumVal;
}
return System.Enum.Parse(enumType, stringValue, ignoreCase);
}
}
I got that EnumUtil.Parse() thing from somewhere else. Maybe here?