Treetop basic parsing and regular expression usage

问题

I'm developing a script using the ruby Treetop library and having issues working with its syntax for regex's. First off, many regular expressions that work in other settings dont work the same in treetop.

This is my grammar: (myline.treetop)

grammar MyLine
    rule line
        string whitespace condition
    end
    rule string
        [\S]*
    end
    rule whitespace
        [\s]*
    end
    rule condition
        "new" / "old" / "used"
    end
end

This is my usage: (usage.rb)

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'myline'

parser = MyLineParser.new
p parser.parse("randomstring new")

This should find the word new for sure and it does! Now I wont to extend it so that it can find new if the input string becomes "randomstring anotherstring new yetanother andanother" and possibly have any number of strings followed by whitespace (tab included) before and after the regex for rule condition. In other words, if I pass it any sentence with the word "new" etc in it, it should be able to match it.

So let's say I change my grammar to:

rule line
    string whitespace condition whitespace string
end

Then, it should be able to find a match for:

p parser.parse("randomstring new anotherstring")

So, what do I have to do to allow the string whitespace to be repeated before and after condition? If I try to write this:

rule line
    (string whitespace)* condition (whitespace string)*
end

, it goes in an infinite loop. If i replace the above () with [], it returns nil In general, regex's return a match when i use the above, but treetop regex's dont. Does anyone have any tips/points on how to go about this? Plus, since there isn't much documentation for treetop and the examples are either too trivial or too complex, is there anyone who knows a more thorough documentation/guide for treetop?

回答1:

It looks like you don't even need a grammar to do what you're asking. A simple regex is sufficient in this case:

line.match(/(.*)\s(new|old|used)\s(.*)/)

(Example: http://rubular.com/r/Kl8rUifxeu )

You can get an array containing the stuff before and after the condition with:

Regexp.last_match(1).split + Regexp.last_match(3)

And test the condition with:

return "Sweet, it's new!" if Regexp.last_match(2) == "new"

回答2:

This has nothing to do with treetop and everything to do with your grammar. The condition rule is entirely matched by your string rule, so it is ambiguous when you break from the (string whitespace)* repetition to condition. Clean up your line rule so you have an unambiguous grammar and you'll be fine. You might want to make it so that things/attributes like condition are tagged as such:

cond:new

That is lexically different from the string rule.

来源：https://stackoverflow.com/questions/2404518/treetop-basic-parsing-and-regular-expression-usage

标签

ruby

regex

parsing

treetop