Can Perl be “statically” parsed?

后端 未结 5 1932
醉梦人生
醉梦人生 2020-12-30 10:42

An article called \"Perl cannot be parsed, a formal proof\" is doing the rounds. So, does Perl decide the meaning of its parsed code at \"run-time\" or \"compile-time\"?

相关标签:
5条回答
  • 2020-12-30 11:10

    Perl has BEGIN blocks, which runs user Perl code at compile-time. This code can affect the meaning of other code to be compiled, thus making it "impossible" to parse Perl.

    For example, the code:

    sub foo { return "OH HAI" }
    

    is "really":

    BEGIN {
        *{"${package}::foo"} = sub { return "OH HAI" };
    }
    

    That means that someone could write Perl like:

    BEGIN {
        print "Hi user, type the code for foo: ";
        my $code = <>;
        *{"${package}::foo"} = eval $code;
    }
    

    Obviously, no static analysis tool can guess what code the user is going to type in here. (And if the user says sub ($) {} instead of sub {}, it will even affect how calls to foo are interpreted throughout the rest of the program, potentially throwing off the parsing.)

    The good news is that the impossible cases are very corner-casey; technically possible, but almost certainly useless in real code. So if you are writing a static analysis tool, this will probably cause you no trouble.

    To be fair, every language worth its salt has this problem, or something similar. As an example, throw your favorite code walker at this Lisp code:

    (iter (for i from 1 to 10) (collect i))
    

    You probably can't predict that this is a loop that produces a list, because the iter macro is opaque and would require special knowledge to understand. The reality is that this is annoying in theory (I can't understand my code without running it, or at least running the iter macro, which may not ever stop running with this input), but very useful in practice (iteration is easy for the programmer to write and the future programmer to read).

    Finally, a lot of people think that Perl lacks static analysis and refactoring tools, like Java has, because of the relative difficulty in parsing it. I doubt this is true, I just think the need is not there and nobody has bothered to write it. (People do need a "lint", so there is Perl::Critic, for example.)

    Any static analysis I have needed to do of Perl to generate code (some emacs macros for maintaining test counters and Makefile.PL) has worked fine. Could weird corner cases throw off my code? Of course, but I don't go out of my way to write code that's impossible to maintain, even though I could.

    0 讨论(0)
  • 2020-12-30 11:12

    Perl has a well-defined "compile time" phase, which is followed by a well-defined "runtime" phase. However, there are ways of transitioning from one to the other. Many dynamic languages have eval constructs that allow compilation of new code during the runtime phase; in Perl the inverse is possible as well -- and common. BEGIN blocks (and the implicit BEGIN block caused by use) invoke a temporary runtime phase during compile-time. A BEGIN block is executed as soon as it's compiled, instead of waiting for the rest of the compilation unit (i.e. current file or current eval) to compile. Since BEGINs run before the code that follows them is compiled, they can influence the compilation of the following code in practically any way (although in practice the main things they do are to import or define subroutines, or to enable strictness or warnings).

    A use Foo; is basically equivalent to BEGIN { require foo; foo->import(); }, with require being (like eval STRING) one of the ways to invoke compile-time from runtime, meaning that we're now within compile-time within runtime within compile-time and the whole thing is recursive.

    Anyway, what it boils down to for the decidability of parsing Perl is that since the compilation of one bit of code can be influenced by the execution of a preceding piece of code (which can in theory do anything), we've got ourselves a halting-problem type situation; the only way to correctly parse a given Perl file in general is by executing it.

    0 讨论(0)
  • 2020-12-30 11:14

    C++ has a similar problem in its template system, but that doesn't stop compilers from compiling it. They will just break out or run forever on the corner cases where this sort of argument would apply.

    0 讨论(0)
  • 2020-12-30 11:19

    Perl has a compile phase, but it's different than most normal compile phases when it comes to code. Perl's lexer turns the code into tokens, then a parser analyzes tokens to form an op tree. However, BEGIN {} blocks can interrupt this process and allow you to execute code. When doing a use. All BEGIN blocks execute before anything else, giving you a way to set up modules and namespaces. During the overall "compile" of a script, you most likely will use Perl to determine how the Perl module should look when it's done. sub, bare, implies adding it to the glob for the package, but you don't have to. For example, this is a (albeit, odd) way of setting up methods in a module:

    package Foo;
    
    use strict;
    use warnings;
    use List::Util qw/shuffle/;
    
    my @names = qw(foo bar baz bill barn);
    my @subs = (
        sub { print "baz!" },
        sub { die; },
        sub { return sub { die } },
    );
    @names = shuffle @names;
    foreach my $index (0..$#subs) {
       no strict 'refs';
       *{$names[$index]} = $subs[$index];
    }
    
    1;
    

    You have to interpret this to even know what it does! It's not very useful, but it's not something you can determine ahead of time. But it's 100% valid perl. Even though this feature can be abused, it can also do great tasks, like build complicated subs that all look very similar, programatically. It also makes it hard to know, for certain, what everything does.

    That's not to say that a perl script can't be 'compiled' - in perl, compiling is merely determining, what right then, the module should look like. You can do that with a

    perl -c myscript.pl
    

    and it will tell you whether or not it can get to the point where it will start executing the main module. You just can't merely know from looking at it 'statically'.

    However, as PPI demonstrates, we can get close. Really close. Close enough to do very interesting things, like (almost static) code analysis.

    "Run Time", then, becomes what happens after all the BEGIN blocks have executed. (This is a simplification; there is a lot more to this. See perlmod for more.) It's still perl code being run, but it's a separate phase of execution, done after all the higher priority blocks have run.

    chromatic has some detailed posts on his Modern::Perl blog:

    • How a Perl 5 Program Works
    • On Parsing Perl 5
    0 讨论(0)
  • 2020-12-30 11:22

    People have used a lot of words to explain various phases, but it's really a simple matter. While compiling Perl source, the perl intrepreter may end up running code that changes how the rest of the code will parse. Static analysis, which runs no code, will miss this.

    In that Perlmonks post, Jeffrey talks about his articles in The Perl Review that go into much more detail, including a sample program that doesn't parse the same way every time you run it.

    0 讨论(0)
提交回复
热议问题