Emulation of lex like functionality in Perl or Python

前端 未结 8 2030
梦毁少年i
梦毁少年i 2021-01-13 23:46

Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

One example:

I have to get all href tags, their corresponding

8条回答
  •  执笔经年
    2021-01-14 00:33

    Modifying Bruno's example to include error checking:

    my $input = "...";
    while (1) {
        if ($input =~ /\G(\w+)/gc) { print "word: '$1'\n"; next }
        if ($input =~ /\G(\s+)/gc) { print "whitespace: '$1'\n"; next }
    
        if ($input !~ /\G\z/gc)  { print "tokenizing error at character " . pos($input) . "\n" }
        print "done!\n"; last;
    }
    

    (Note that using scalar //g is unfortunately the one place where you really can't avoid using the $1, etc. variables.)

提交回复
热议问题