Best way to parse custom Filtersyntax

问题

I have a program which allows the user to enter a filter in a textbox in the column header of a DataGridView. This text is then parsed into a list of FilterOperations.

Currently i tokenize the string and then build the list in a hunge For-loop.

Which Desing Patterns could i use to get rid of the huge for consruct?

Are there any other actions i can take to improve the design?

In the current state its hard to add support for another operator, datatype or build something else thant the filterlist. Lets say i need to replace the filterlist with building an Expression (which will be the case soon) or building an SQL Where clause.

Filtersyntax

The filter follows this Syntax and is valid for Strings, Digits and DateTimes:

Rangeoperator

lowerLimit .. upperLimit

29..52 would be parsed to two elements in the filter list "x >= 29" and "x <=52"

LowerThan

.. upperLimit

..52 would be parsed to "x < 52"

GreaterThan

lowerLimit ..

29.. would be parsed to "x > 29"

Wildcard

*someText* would equal x LIKE "%someText%" in SQL

String literal

' operators like .. or * are ignored in between the single quotes '

Tokens

So i definded three Tokens

RangeOperator for ..

Wildcard for *

Text for pure values and the values in single quotes

My ugly code to build the list

public static FilterList<T> Parse<T>(string filter, string columnname, Type dataType) where T : class
        {
            if (dataType != typeof(float) && dataType != typeof(DateTime) && dataType != typeof(string))
                throw new NotSupportedException(String.Format("Data Type is not supported '{0}'", dataType));

            Token[] filterParts = tokenize(filter);
            filterParts = cleanUp(filterParts);

            StringBuilder sb = new StringBuilder();

            for (int i = 0; i < filterParts.Length; i++)
            {
                Token currentToken = filterParts[i];
                //BereichsFilter prüfen und bauen
                if (currentToken.TokenType == TokenType.RangeOperator)
                {
                    if (filterParts.Length < 2)
                    {
                        throw new FilterException("Missing argument for RangeOperator");
                    }
                    if (filterParts.Length > 3)
                    {
                        throw new FilterException("RangeOperator can't be mixed with other operators");
                    }

                    if (i == 0)
                    {
                        if (filterParts.Length == 2)
                        {
                            //Bis Operator
                            Token right = filterParts[1];
                            if (right.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(right.Text))
                                throw new FilterException("Text must have value");
                            if (right.Text.StartsWith("."))
                                throw new FilterException("Text starting with a dot is not valid");

                            if (dataType == typeof(string))
                                return new FilterList<T> { { columnname, FilterOperator.Less, right.Text } };
                            //filterString = String.Format("({0} < '{1}' OR {0} IS NULL)", columnname, right.Text);
                            if (dataType == typeof(float))
                            {
                                float rightF;
                                if (!float.TryParse(right.Text, out rightF))
                                    throw new FilterException(
                                        String.Format("right parameter has wrong format '{0}'", right.Text));
                                return new FilterList<T> { { columnname, FilterOperator.Less, rightF } };
                                //filterString = String.Format("({0} < {1} OR {0} IS NULL)", columnname, rightF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime rightDt = parseDateTime(right.Text);
                                return new FilterList<T> { { columnname, FilterOperator.Less, rightDt } };
                                //filterString = String.Format("({0} < '{1}' OR {0} IS NULL)", columnname, rightDT.ToString(CultureInfo.InvariantCulture));
                            }

                            break;
                        }
                        throw new FilterException("too many arguments");
                    }
                    if (i == 1)
                    {
                        if (filterParts.Length == 2)
                        {
                            //Von Operator
                            Token left = filterParts[0];
                            if (left.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(left.Text))
                                throw new FilterException("Argument must have value");

                            if (dataType == typeof(string))
                                return new FilterList<T> { { columnname, FilterOperator.Greater, left.Text } };
                            //filterString = String.Format("({0} > '{1}')", columnname, left.Text);
                            if (dataType == typeof(float))
                            {
                                float leftF;
                                if (!float.TryParse(left.Text, out leftF))
                                    throw new FilterException(String.Format(
                                        "left parameter has wrong format '{0}'", left.Text));
                                return new FilterList<T> { { columnname, FilterOperator.Greater, leftF } };
                                //filterString = String.Format("({0} > {1})", columnname, leftF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime leftDt = parseDateTime(left.Text);
                                return new FilterList<T> { { columnname, FilterOperator.Greater, leftDt } };
                                //filterString = String.Format("({0} > '{1}')", columnname, leftDT.ToString(CultureInfo.InvariantCulture));
                            }
                            break;
                        }
                        else
                        {
                            //BereichsOperator
                            Token left = filterParts[0];
                            if (left.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(left.Text))
                                throw new FilterException("parameter must have value");

                            Token right = filterParts[2];
                            if (right.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(right.Text))
                                throw new FilterException("parameter must have value");

                            if (dataType == typeof(string))
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, left.Text},
                                    {columnname, FilterOperator.LessOrEqual, right.Text}
                                };
                            //filterString = String.Format("{0} >= '{1}' AND {0} <= '{2}'", columnname, left.Text, right.Text);
                            if (dataType == typeof(float))
                            {
                                float rightF;
                                if (!float.TryParse(right.Text, out rightF))
                                    throw new FilterException(
                                        String.Format("right parameter has wrong format '{0}'", right.Text));
                                float leftF;
                                if (!float.TryParse(left.Text, out leftF))
                                    throw new FilterException(String.Format(
                                        "left parameter has wrong format'{0}'", left.Text));
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, leftF},
                                    {columnname, FilterOperator.LessOrEqual, rightF}
                                };
                                //filterString = String.Format("{0} >= {1} AND {0} <= {2}", columnname, leftF.ToString(CultureInfo.InvariantCulture), leftF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime rightDt = parseDateTime(right.Text);
                                DateTime leftDt = parseDateTime(left.Text); 
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, leftDt},
                                    {columnname, FilterOperator.LessOrEqual, rightDt}
                                };
                                //filterString = String.Format("{0} >= '{1}' AND {0} <= '{2}'", columnname, leftDT.ToString(CultureInfo.InvariantCulture), rightDT.ToString(CultureInfo.InvariantCulture));
                            }

                            break;
                        }
                    }
                    throw new FilterException("unexpected parameter");
                }
                //Stringsuche Bauen
                if (currentToken.TokenType == TokenType.Wildcard)
                {
                    if (dataType != typeof(string))
                        throw new FilterException("Operator not allowed with this Data Type");
                    //Fehler wenn Datentyp kein string
                    sb.Append("%");
                }
                else if (currentToken.TokenType == TokenType.Text)
                    sb.Append(escape(currentToken.Text));
            }

            //Filterung auf Zeichenfolge
            string text = sb.ToString();
            if (dataType == typeof(string))
                return new FilterList<T> { { columnname, FilterOperator.Like, text } };
            //filterString = String.Format("{0} LIKE '{1}' ESCAPE '\\'", columnname, text);
            if (dataType == typeof(DateTime))
            {
                DateTime dt = parseDateTime(text);
                return new FilterList<T> { { columnname, FilterOperator.Equal, dt } };
                //filterString = String.Format("{0} = '{1}'", columnname, DT.ToString(CultureInfo.InvariantCulture));
            }
            if (dataType == typeof(float))
            {
                float f;
                if (!float.TryParse(text, out f))
                    throw new FilterException(String.Format("parameter has wrong format '{0}'", text));
                return new FilterList<T> { { columnname, FilterOperator.Equal, f } };
                //filterString = String.Format("{0} = {1}", columnname, F.ToString(CultureInfo.InvariantCulture));
            }

            return null;
        }

回答1:

You need to find a code generator for C# that is based on Parsing Expression Grammars. It lets you define a grammar that is then turned into code by the generator. The code will then be able to parse the text obeying the grammar you are expecting.

A very quick google-fu shows that peg-sharp could work.

In order to learn using PEG you can try the online version of PEG.js which works almost along the workflow you'd be ultimately using:

type the PEG declaration (left window)
javascript parser is updated dynamically (top right window)
parser parses your input and produce a result (lower right window)

As a proof of concept, here is a tentative implementation of your grammar that you could copy paste in PEG.js (I guess one could manage to embed it in the stackoverflow widget):

Here is the syntax:

start
  = filters

filters
  = left:filter " " right:filters { return {filter: left, operation: "AND", filters: right};}
  / filter

filter
  = applicableRange:range {return {type: "range", range: applicableRange};}
 / openWord:wildcard  {return {type: "wildcard", word: openWord};}
 / simpleWord:word {return simpleWord;}
 / sentence:sentence {return sentence;}

sentence
 = "'" + letters:[0-9a-zA-Z *.]* "'" {return {type: "sentence", value: letters.join("")};}

word "aword"
  = letters:[0-9a-zA-Z]+ { return {type: "word", value: letters.join("")}; }

wildcard
  = 
 "*" word:word "*" {return {type: "wildcardBoth", value: word};}
/ "*" word:word {return {type: "wildcardStart", value: word};}
/ word:word "*" {return {type: "wildcardEnd", value: word};}

range "range"
  = left:word? ".." right:word? {return {from: left, to: right};}

Basically the grammar lets you define the building blocks of your language and how they are articulated in relation one to another. For example a filter can be a range, a wildcard, a word, a sentence or nothing at all (at least that's what i went for when defining the grammar; the last option is to end the recursion in filters).

Along with those blocks you can define what the output will be if these blocks are encountered. In this case I output a JSON object that expresses what kind of filtering should occur, and what parameters the filter will have.

If you test the grammar with the following input:

'testing range' 123..456 123.. ..abc 'and testing wildcards' word1* *word2 *word3* cool heh

you will get back a structure that describes the filters that should be built according to the grammar:

{
   "filter": {
      "type": "sentence",
      "value": "testing range"
   },
   "operation": "AND",
   "filters": {
      "filter": {
         "type": "range",
         "range": {
            "from": {
               "type": "word",
               "value": "123"
            },
            "to": {
               "type": "word",
               "value": "456"
            }
         }
      },
      "operation": "AND",
      "filters": {
         "filter": {
            "type": "range",
            "range": {
               "from": {
                  "type": "word",
                  "value": "123"
               },
               "to": null
            }
         },
         "operation": "AND",
         "filters": {
            "filter": {
               "type": "range",
               "range": {
                  "from": null,
                  "to": {
                     "type": "word",
                     "value": "abc"
                  }
               }
            },
            "operation": "AND",
            "filters": {
               "filter": {
                  "type": "sentence",
                  "value": "and testing wildcards"
               },
               "operation": "AND",
               "filters": {
                  "filter": {
                     "type": "wildcard",
                     "word": {
                        "type": "wildcardEnd",
                        "value": {
                           "type": "word",
                           "value": "word1"
                        }
                     }
                  },
                  "operation": "AND",
                  "filters": {
                     "filter": {
                        "type": "wildcard",
                        "word": {
                           "type": "wildcardStart",
                           "value": {
                              "type": "word",
                              "value": "word2"
                           }
                        }
                     },
                     "operation": "AND",
                     "filters": {
                        "filter": {
                           "type": "wildcard",
                           "word": {
                              "type": "wildcardBoth",
                              "value": {
                                 "type": "word",
                                 "value": "word3"
                              }
                           }
                        },
                        "operation": "AND",
                        "filters": {
                           "filter": {
                              "type": "word",
                              "value": "cool"
                           },
                           "operation": "AND",
                           "filters": {
                              "type": "word",
                              "value": "heh"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

The principle will be the same for the C# generator: compile the grammar into some C# code capable of parsing your inputs, and define what should happen when the parsing hits this or that block.

You will need to recompile the grammar if changes occur (though it can easily be included in your build step) but you will be able to generate a structure representing the filters that have been parsed and use it to filter your search results.

One huge advantage of PEG is that the format is well known and there plenty of sources for learning about it online, so the knowledge will be transferable to other languages / uses

回答2:

You can use Gold Parser to create your Syntax Tree or any other way to have it. here is the link http://goldparser.org/

In addition to that you can use the visitor design pattern to generate your filter list. https://en.wikipedia.org/wiki/Visitor_pattern

with these two you can make a pretty extensible solution.

来源：https://stackoverflow.com/questions/28211739/best-way-to-parse-custom-filtersyntax

标签

parsing

filter

tokenize