treetop | 易学教程

Non-greedy matching in Treetop/PEG?

阅读更多关于 Non-greedy matching in Treetop/PEG?

问题 How would I do something like this in Treetop? /.+?;/ It seems like the only way is to do: [^;]+ ';' Which is kind of ugly.. any other way? .+? doesn't seem to work.. 回答1: PEGs are greedy and blind by default, that means they eat as much input as they can and they do not consider what comes afterwards: S <- P1* P2 (greedy, blind) That can be considerably easy fixed though by making use of the ordered choice (and without using lookaheads): S <- P1 S / P2 (greedy, non-blind) S <- P2 / P1 S

Learning Treetop

阅读更多关于 Learning Treetop

问题 I'm trying to teach myself Ruby's Treetop grammar generator. I am finding that not only is the documentation woefully sparse for the "best" one out there, but that it doesn't seem to work as intuitively as I'd hoped. On a high level, I'd really love a better tutorial than the on-site docs or the video, if there is one. On a lower level, here's a grammar I cannot get to work at all: grammar SimpleTest rule num (float / integer) end rule float ( (( '+' / '-')? plain_digits '.' plain_digits) / (

Treetop basic parsing and regular expression usage

阅读更多关于 Treetop basic parsing and regular expression usage

问题 I'm developing a script using the ruby Treetop library and having issues working with its syntax for regex's. First off, many regular expressions that work in other settings dont work the same in treetop. This is my grammar: (myline.treetop) grammar MyLine rule line string whitespace condition end rule string [\S]* end rule whitespace [\s]* end rule condition "new" / "old" / "used" end end This is my usage: (usage.rb) require 'rubygems' require 'treetop' require 'polyglot' require 'myline'

best way to parse plain text file with a nested information structure

阅读更多关于 best way to parse plain text file with a nested information structure

问题 The text file has hundreds of these entries (format is MT940 bank statement) {1:F01AHHBCH110XXX0000000000}{2:I940X N2}{3:{108:XBS/091502}}{4: :20:XBS/091202/0001 :25:5887/507004-50 :28C:140/1 :60F:C0914CHF7789, :61:0912021202D36,80NTRFNONREF//0887-1202-29-941 04392579-0 LUTHY + xxx, ZUR :86:6034?60LUTHY + xxxx, ZUR vom 01.12.09 um 16:28 Karten-Nr. 2232 2579-0 :62F:C091202CHF52,2 :64:C091302CHF52,2 -} This should go into an Array of Hashes like [{"1"=>"F01AHHBCH110XXX0000000000"}, "2"=>"I940X

recognize Ruby code in Treetop grammar

阅读更多关于 recognize Ruby code in Treetop grammar

问题 I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following: <% ruby_code_here %> <%= other_ruby_code %> Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop grammar file, to have something like:

Can I use Treetop to parse an IO?

阅读更多关于 Can I use Treetop to parse an IO?

问题 I've got a file that I want to parse with Treetop. If I wanted to parse the entire thing, I'd use rule document category_listing* end I don't really want to read the entire file into memory at once. I know I can set up the parser to parse one category_listing at a time (using #consume_all_input = false and #root = :category_listing ), which is half the problem. However, it looks like #parse expects to be passed a String (and it certainly fails when I try to pass it a File ), which makes the

Can I use Treetop to parse an IO?

阅读更多关于 Can I use Treetop to parse an IO?

I've got a file that I want to parse with Treetop. If I wanted to parse the entire thing, I'd use rule document category_listing* end I don't really want to read the entire file into memory at once. I know I can set up the parser to parse one category_listing at a time (using #consume_all_input = false and #root = :category_listing ), which is half the problem. However, it looks like #parse expects to be passed a String (and it certainly fails when I try to pass it a File ), which makes the idea of reading and parsing category_listing by category_listing sound like a PITA. Can Treetop only be

recognize Ruby code in Treetop grammar

阅读更多关于 recognize Ruby code in Treetop grammar

I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following: <% ruby_code_here %> <%= other_ruby_code %> Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop grammar file, to have something like: rule erb_tag "<%" ruby_code "%>" { def content ... end } end Where ruby_code is handled by some rules

Non-greedy matching in Treetop/PEG?

阅读更多关于 Non-greedy matching in Treetop/PEG?

How would I do something like this in Treetop? /.+?;/ It seems like the only way is to do: [^;]+ ';' Which is kind of ugly.. any other way? .+? doesn't seem to work.. PEGs are greedy and blind by default, that means they eat as much input as they can and they do not consider what comes afterwards: S <- P1* P2 (greedy, blind) That can be considerably easy fixed though by making use of the ordered choice (and without using lookaheads): S <- P1 S / P2 (greedy, non-blind) S <- P2 / P1 S (lazy, non-blind) Well, I learnt PEGs are greedy, and there's no way around it. Lookaheads can be used to mimic

PEG for Python style indentation

阅读更多关于 PEG for Python style indentation

How would you write a Parsing Expression Grammar in any of the following Parser Generators ( PEG.js , Citrus , Treetop ) which can handle Python/Haskell/CoffeScript style indentation: Examples of a not-yet-existing programming language: square x = x * x cube x = x * square x fib n = if n <= 1 0 else fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets Update: Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following: foo bar = 1 baz = 2 tap zap = 3 # should yield (ruby style hashmap): #