In “aa67bc54c9”, is there any way to print “aa” 67 times, “bc” 54 times and so on, using regular expressions?

前端 未结 5 2052
谎友^
谎友^ 2020-12-15 14:13

I was asked this question in an interview for an internship, and the first solution I suggested was to try and use a regular expression (I usually am a little stumped in int

相关标签:
5条回答
  • 2020-12-15 14:23

    Answering your question directly:

    • No, regular expressions match text and don't print anything, so there is no way to do it solely using regular expressions.

    The regular expression you gave will match one string/number pair; you can then print that repeatedly using an appropriate mechanism. The Perl solution from @tster is about as compact as it gets. (It doesn't use the names that you applied in your regex; I'm pretty sure that doesn't matter.)

    The remaining details depend on your implementation language.

    0 讨论(0)
  • 2020-12-15 14:38

    Nope, this is your basic 'trick question' - no matter how you answer it that answer is wrong unless you have exactly the answer the interviewer was trained to parrot. See the workup of the issue given by Pavel Shved - note that all invocations have 'not' as a common condition, the tool just keeps sliding: Even when it changes state there is no counter in that state

    I have a rather advanced book by Kenneth C Louden who is a college prof on the matter, in which it is stated that the issue at hand is codified as "Regex's can't count." The obvious answer to the question seems to me at the moment to be using the lookahead feature of Regex's ...

    Probably depends on what build of what brand of regex the interviewer is using, which probably depends of flight-dynamics of Golf Balls.

    0 讨论(0)
  • 2020-12-15 14:42

    how about:

    while ($line =~ s/^([a-z]+)(\d+)//i)
    {
        print $1 x $2;
    }
    
    0 讨论(0)
  • 2020-12-15 14:42

    Nice answers so far. Regular expressions alone are generally thought of as a way to match patterns, not generate output in the manner you mentioned.

    Having said that, there is a way to use regex as part of the solution. @Jonathan Leffler made a good point in his comment to tster's reply: "... maybe you need a better regex library in your language."

    Depending on your language of choice and the library available, it is possible to pull this off. Using C# and .NET, for example, this could be achieved via the Regex.Replace method. However, the solution is not 100% regex since it still relies on other classes and methods (StringBuilder, String.Join, and Enumerable.Repeat) as shown below:

    string input = "aa67bc54c9";
    string pattern = @"([a-z]+)(\d+)";
    string result = Regex.Replace(input, pattern, m =>
            // can be achieved using StringBuilder or String.Join/Enumerable.Repeat
            // don't use both
            //new StringBuilder().Insert(0, m.Groups[1].Value, Int32.Parse(m.Groups[2].Value)).ToString()
            String.Join("", Enumerable.Repeat(m.Groups[1].Value, Int32.Parse(m.Groups[2].Value)).ToArray())
             + Environment.NewLine // comment out to prevent line breaks
            );
    Console.WriteLine(result);
    

    A clearer solution would be to identify the matches, loop over them and insert them using the StringBuilder rather than rely on Regex.Replace. Other languages may have compact idioms to handle the string multiplication that doesn't rely on other library classes.

    To answer the interview question, I would reply with, "it's possible, however the solution would not be a stand-alone 100% regex approach and would rely on other language features and/or libraries to handle the generation aspect of the question since the regex alone is helpful in matching patterns, not generating them."

    And based on the other responses here you could beef up that answer further if needed.

    0 讨论(0)
  • 2020-12-15 14:43

    Do you know why "regular expressions" are called "regular"? :-)

    That would be too long to explain, I'll just outline the way. To match a pattern (i.e. decide whether a given string is "valid" or "invalid"), a theoretical informatician would use a finite state automaton. That's an abstract machine that has a finite number of states; each tick it reads a char from the input and jumps to another state. The pattern of where to jump from particular state when a particular character is read is fixed. Some states are marked as "OK", some--as "FAIL", so that by examining state of a machine you can check whether your text is "valid" (i.e. a valid e-mail).

    For example, this machine only accepts "nice" as its "valid" word (a pic from Wikipedia):

    a picture from Wikipedia article referenced above

    A set of "valid" words such a machine theoretically can distinguish from invalid is called "regular language". Not every set is a regular language: for example, finite state automata are incapable of checking whether parentheses in string are balanced.

    But constructing state machines was a complex task, compared to the complexity of defining what "valid" is. So the mathematicians (mainly S. Kleene) noted that every regular language could be described with a "regular expression". They had *s and |s and were the prototypes of what we know as regexps now.


    What does it have to do with the problem? The problem in subject is essentially non-regular. It can't be expressed with anything that works like a finite automaton.

    The essence is that it should contain a memory cell that is capable to hold an arbitrary number (repetition count in your case). Finite automata and classical regular expressions can not do this.

    However, modern regexps are more expressive and are said to be able to check balanced parentheses! But this may serve as a good example that you shouldn't use regexps for tasks they don't suit. Let alone that it contains code snippets; this makes the expression far from being "regular".

    Answering the initial question, you can't solve your problem with using anything "regular" only. However, regexps could be aid you in solving this problem, as in tster's answer


    Perhaps, I should look closer to tster's answer (do a "+1" there, please!) and show why it's not the "regular expression" solution. One may think that it is, it just contains print statement (not essential) and a loop--and loop concept is compatible with finite state automaton expressive power. But there is one more elusive thing:

    while ($line =~ s/^([a-z]+)(\d+)//i)
    {
        print $1 
                 x  # <--- this one
                   $2;
    }
    

    The task of reading a string and a number and printing repeatedly that string given number of times, where the number is an arbitrary integer, is undoable on a finite state machine without additional memory. You use a memory cell to keep that number and decrease it, and check for it to be greater than zero. But this number may be arbitrarily big, and it contradicts with a finite memory available to the finite state machine.

    However, there's nothing wrong with classical pattern /([abc]*){5}/ that matches something "regular" repeated fixed number of times. We essentially have states that correspond to "matched pattern once", "matched pattern twice" ... "matched pattern 5 times". There's finite number of them, and that's the gist of the difference.

    0 讨论(0)
提交回复
热议问题