Emulation of lex like functionality in Perl or Python

前端未结

关注

 8  2037

梦毁少年i

Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

One example:

I have to get all href tags, their corresponding

相关标签:

8条回答

执笔经年

2021-01-14 00:33

Modifying Bruno's example to include error checking:

my $input = "...";
while (1) {
    if ($input =~ /\G(\w+)/gc) { print "word: '$1'\n"; next }
    if ($input =~ /\G(\s+)/gc) { print "whitespace: '$1'\n"; next }

    if ($input !~ /\G\z/gc)  { print "tokenizing error at character " . pos($input) . "\n" }
    print "done!\n"; last;
}

(Note that using scalar //g is unfortunately the one place where you really can't avoid using the $1, etc. variables.)

0 讨论(0)

庸人自扰

2021-01-14 00:37
Have you looked at PyParsing?

From their homepage:

Here is a program to parse "Hello, World!" (or any greeting of the form ", !"):
```
from pyparsing import Word, alphas
greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here
hello = "Hello, World!"
print hello, "->", greet.parseString( hello )
```
The program outputs the following:
```
Hello, World! -> ['Hello', ',', 'World', '!']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2