When not to use Regex in C# (or Java, C++, etc.)

后端 未结 7 1151
囚心锁ツ
囚心锁ツ 2020-11-28 13:12

It is clear that there are lots of problems that look like a simple regex expression will solve, but which prove to be very hard to solve with regex.

<
相关标签:
7条回答
  • 2020-11-28 13:53

    You should always learn regular expressions - only this way you can judge when to use them. Normally they get problematic, when you need very good performance. But often it is a lot easier to use a regex than to write a big switch statement.

    Have a look at this question - which shows you the elegance of a regex in contrast to the similar if() construct ...

    0 讨论(0)
  • 2020-11-28 13:54

    Use regular expressions for recognizing (regular) patterns in text. Don't use it for parsing text into data structures. Don't use regular expressions when the expression becomes very large.

    Often it's not clear when not to use a regular expression. For example, you shouldn't use regular expressions for proper email address verification. At first it may seem easy, but the specification for valid email addresses isn't as regular as you might think. You could use a regular expression to initial searching of email address candidates. But you need a parser to actually verify if the address candidate conforms to the given standard.

    0 讨论(0)
  • 2020-11-28 13:57

    I'm a beginner when it comes to regex, but IMHO it is worthwhile to spend some time learning basic regex, you'll realise that many, many problems you've solved differently could (and maybe should) be solved using regex.

    For a particular problem, try to find a solution at a site like regexlib, and see if you can understand the solution.

    As indicated above, regex might not be sufficient to solve a specific problem, but browsing a browsing a site like regexlib will certainly tell you if regex is the right solution to your problem.

    0 讨论(0)
  • 2020-11-28 14:04

    At the very least, I'd say learn regular expressions just so that you understand them fully and be able to apply them in situations where they would work. Off the top of my head I'd use regular expressions for:

    • Identifying parts of a string.
    • Checking whether a string conforms to a certain format or construction.
    • Finding substrings that match a certain pattern.
    • Transforming strings that fit a certain pattern into a different form (search-replace, capitalization, etc.).

    Regular expressions at a theoretical level form the foundations of what a state machine is -- in computer science, you have Deterministic Finite Automata (DFA) and Non-deterministic Finite Automata (NFA). You can use regular expressions to enforce some kind of validation on inputs -- regular expression engines simply interpret or convert regular expression patterns/strings into actual runtime operations.

    Once you know whether the string (or data) you want to determine to be valid could be tested by a DFA, you have a choice of whether to implement that DFA yourself using your own code or using a regular expression engine. You'll find that knowing about regular expressions will actually enhance your toolbox and your understanding of how string processing can actually get complex.

    Based on simple regular expressions you can then look into learning about parsers and how parsers work. At the lowest level you're looking at lexical analysis (where regular expressions work) and at a higher level a grammar and semantic actions. These are the bases upon which compilers and interpreters work, as well as protocol parser implementations, and document rendering/transformation applications rely on.

    0 讨论(0)
  • 2020-11-28 14:14

    There are two aspects to consider:

    • Capability: is the language you are trying to recognize a Type-3 language (a regular one)? if so, then you might use regex, if not, you need a more powerful tool.

    • Maintainability: If it takes more time write, test and understand a regular expression than its programmatic counterpart, then it's not appropriate. How to check this is complicated, I'd recommend peer review with your fellows (if they say "what the ..." when they see it, then it's too complicated) or just leave it undocumented for a few days and then take a look by yourself and measure how long does it take to understand it.

    0 讨论(0)
  • 2020-11-28 14:15

    Don't try to use regex to parse hierarchical text like program source (or nested XML): they are proven to be not powerful enough for that, for example, they can't, for a string of parens, figure out whether they're balanced or not.

    Use parser generators (or similar technologies) for that.

    Also, I'd not recommend using regex to validate data with strict formal standards, like e-mail addresses. They're harder than you want, and you'll either have unaccurate or a very long regex.

    0 讨论(0)
提交回复
热议问题