When is a issue too complex for a regular expression?

后端 未结 13 1007
执念已碎
执念已碎 2021-02-01 19:42

Please don\'t answer the obvious, but what are the limit signs that tell us a problem should not be solved using regular expressions?

For example: Why is a complete emai

相关标签:
13条回答
  • 2021-02-01 20:24

    When you need to parse an expression that's not defined by a regular language.

    0 讨论(0)
  • 2021-02-01 20:24

    Whenever you can't be sure it really solves the problem, for example:

    • HTML parsing
    • Email validation
    • Language parsers

    Especially so when there already exist tools that solve the problem in a totally understandable way.

    Regex can be used in the domains I mentioned, but only as a subset of the whole problem and for specific, simple cases.

    This goes beyond the technical limitations of regexes (regular languages + extensions), the maintainability and readability limit is surpassed a lot earlier than the technical limit in most cases.

    0 讨论(0)
  • 2021-02-01 20:27

    Sure sign to stop using regexps is this: if you have many grouping braces '()' and many alternatives '|' then it is a sure sign that you try to do a (complex) parsing with regular expressions.

    Add to the mix Perl extensions, backreferences, etc and soon you have yourself a parser that is hard to read, hard to modify, and hard to reason about it's properties (e.g. is there an input on which this parser will work in a exponential time).

    This is a time to stop regexing and start parsing (with hand-made parser, parser generators or parser combinators).

    0 讨论(0)
  • 2021-02-01 20:32

    Regular expressions are a textual representation of finite-state automata. That is to say, they are limited to only non-recursive matching. This means that you can't have any concept of "scope" or "sub-match" in your regexp. Consider the following problem:

    (())()
    

    Are all the open parens matched with a close paren?

    Obviously, when we look at this as human beings, we can easily see that the answer is "yes". However, no regular expression will be able to reliably answer this question. In order to do this sort of processing, you will need a full pushdown automaton (like a DFA with a stack). This is most commonly found in the guise of a parser such as those generated by ANTLR or Bison.

    0 讨论(0)
  • 2021-02-01 20:34

    A problem is too complex for regular expressions when constraints of the problem can change after the solution is written. So, in your example, how can you be sure an email address is valid when you do not have access to the target mail system to verify that the email address is attached to a valid user? You can't.

    0 讨论(0)
  • 2021-02-01 20:35

    Regular expressions are suited for tokenizing, finding or identifying individual bits of text, e.g. finding keywords, strings, comments, etc. in source code.

    Regular expressions are not suited for determining the relationship between multiple bits of text, e.g. finding a block of source code with properly paired braces. You need a parser for that. The parser can use regular expressions for tokenizing the input, while the parser itself determines how the different regex matches fit together.

    Essentially, you're going to far with your regular expressions if you start thinking about "balancing groups" (.NET's capture group subtraction feature) or "recursion" (Perl 5.10 and PCRE).

    0 讨论(0)
提交回复
热议问题