Why can't Regular Expressions use keywords instead of characters?

后端 未结 14 3218
甜味超标
甜味超标 2021-02-20 06:46

Okay, I barely understand RegEx basics, but why couldn\'t they design it to use keywords (like SQL) instead of some cryptic wildcard characters and symbols?

Is it for pe

相关标签:
14条回答
  • 2021-02-20 07:08

    It's Perl's fault...!

    Actually, more specifically, Regular Expressions come from early Unix development, and concise syntax was a lot more highly valued then. Storage, processing time, physical terminals, etc were all very limited, rather unlike today.

    The history of Regular Expressions on Wikipedia explains more.

    There are alternatives to Regex, but I'm not sure any have really caught on.

    EDIT: Corrected by John Saunders: Regular Expressions were popularised by Unix, but first implemented by the QED editor. The same design constraints applied, even more so, to earlier systems.

    0 讨论(0)
  • 2021-02-20 07:08

    Perl 6 is taking a pretty revolutionary step forward in regex readability. Consider an address of the form: 100 E Main St Springfield MA 01234

    Here's a moderately-readable Perl 5 compatible regex to parse that (many corner cases not handled):

     m/
         ([1-9]\d*)\s+
         ((?:N|S|E|W)\s+)?
         (\w+(?:\s+\w+)*)\s+
         (ave|ln|st|rd)\s+
         ([:alpha:]+(?:\s+[:alpha:]+)*)\s+
         ([A-Z]{2})\s+
         (\d{5}(?:-\d{4})?)
      /ix;
    

    This Perl 6 regex has the same behavior:

    grammar USMailAddress {
         rule  TOP { <addr> <city> <state> <zip> }
    
         rule  addr { <[1..9]>\d* <direction>?
                      <streetname> <streettype> }
         token direction { N | S | E | W }
         token streetname { \w+ [ \s+ \w+ ]* }
         token streettype {:i ave | ln | rd | st }
         token city { <alpha> [ \s+ <alpha> ]* }
         token state { <[A..Z]>**{2} }
         token zip { \d**{5} [ - \d**{4} ]? }
      }
    

    A Perl 6 grammar is a class, and the tokens are all invokable methods. Use it like this:

    if $addr ~~ m/^<USMailAddress::TOP>$/ {
         say "$<city>, $<state>";
    }
    

    This example comes from a talk I presented at the Frozen Perl 2009 workshop. The Rakudo implementation of Perl 6 is complete enough that this example works today.

    0 讨论(0)
  • 2021-02-20 07:11

    Because it corresponds to formal language theory and it's mathematic notation.

    0 讨论(0)
  • 2021-02-20 07:11

    Regular expressions have a mathematical (actually, language theory) background and are coded somewhat like a mathematical formula. You can define them by a set of rules, for example

    • every character is a regular expression, representing itself
    • if a and b are regular expressions, then a?, a|b and ab are regular expressions, too
    • ...

    Using a keyword-based language would be a great burden for simple regular expressions. Most of the time, you will just use a simple text string as search pattern:

    grep -R 'main' *.c
    

    Or maybe very simple patterns:

    grep -c ':-[)(]' seidl.txt
    

    Once you get used to regular expressions, this syntax is very clear and precise. In more complicated situations you will probably use something else since a large regular expression is obviously hard to read.

    0 讨论(0)
  • 2021-02-20 07:13

    Because the idea of regular expressions--like many things that originate from UNIX--is that they are terse, favouring brevity over readability. This is actually a good thing. I've ended up writing regular expressions (against my better judgement) that are 15 lines long. If that had a verbose syntax it wouldn't be a regex, it'd be a program.

    0 讨论(0)
  • 2021-02-20 07:15

    Actually, no, the world did not begin with Unix. If you read the Wikipedia article, you'll see that

    In the 1950s, mathematician Stephen Cole Kleene described these models using his mathematical notation called regular sets. The SNOBOL language was an early implementation of pattern matching, but not identical to regular expressions. Ken Thompson built Kleene's notation into the editor QED as a means to match patterns in text files. He later added this capability to the Unix editor ed, which eventually led to the popular search tool grep's use of regular expressions

    0 讨论(0)
提交回复
热议问题