Library to parse ERB files

前端 未结 2 1373
耶瑟儿~
耶瑟儿~ 2021-02-10 10:25

I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic con

相关标签:
2条回答
  • 2021-02-10 10:56

    I recently had a similar problem. The approach that I took was to write a small script (erblint.rb) do a string substitution to convert the ERB tags (<% %> and <%= %>) to XML tags, and then parse using Nokogiri.

    See the following code to see what I mean:

    #!/usr/bin/env ruby
    require 'rubygems'
    require 'nokogiri'
    
    # This is a simple program that reads in a Ruby ERB file, and parses
    # it as an XHTML file. Specifically, it makes a decent attempt at
    # converting the ERB tags (<% %> and <%= %>) to XML tags (<erb-disp/>
    # and <erb-eval/> respectively.
    #
    # Once the document has been parsed, it will be validated and any
    # error messages will be displayed.
    #
    # More complex option and error handling is left as an exercise to the user.
    
    abort 'Usage: erb.rb <filename>' if ARGV.empty?
    
    filename = ARGV[0]
    
    begin
      doc = ""
      File.open(filename) do |file|
        puts "\n*** Parsing #{filename} ***\n\n"
        file.read(nil, s = "")
    
        # Substitute the standard ERB tags to convert them to XML tags
        #   <%= ... %> for <erb-disp> ... </erb-disp>
        #   <% ... %>  for <erb-eval> ... </erb-eval>
        #
        # Note that this won't work for more complex expressions such as:
        #   <a href=<% @some_object.generate_url -%> >link text</a>
        # Of course, this is not great style, anyway...
        s.gsub!(/<%=(.+?)%>/m, '<erb-disp>\1</erb-disp>')
        s.gsub!(/<%(.+?)%>/m, '<erb-eval>\1</erb-eval>')
        doc = Nokogiri::XML(s) do |config|
          # put more config options here if required
          # config.strict
        end
      end
    
      puts doc.to_xhtml(:indent => 2, :encoding => 'UTF-8')
      puts "Huzzah, no errors!" if doc.errors.empty?
    
      # Otherwise, print each error message
      doc.errors.each { |e| puts "Error at line #{e.line}: #{e}" }
    rescue
      puts "Oops! Cannot open #{filename}"
    end
    

    I've posted this as a gist on Github: https://gist.github.com/787145

    0 讨论(0)
  • 2021-02-10 11:05

    I eventually ended up solving this problem by using RLex, http://raa.ruby-lang.org/project/ruby-lex/, the ruby version of lex with the following grammer:

    %{
    
    #define NUM 257
    
    #define OPTOK 258
    #define IDENT 259
    #define OPETOK 260
    #define CLSTOK 261
    #define CLTOK 262
    #define FLOAT 263
    #define FIXNUM 264
    #define WORD 265
    #define STRING_DOUBLE_QUOTE 266
    #define STRING_SINGLE_QUOTE 267
    
    #define TAG_START 268
    #define TAG_END 269
    #define TAG_SELF_CONTAINED 270
    #define ERB_BLOCK_START 271
    #define ERB_BLOCK_END 272
    #define ERB_STRING_START 273
    #define ERB_STRING_END 274
    #define TAG_NO_TEXT_START 275
    #define TAG_NO_TEXT_END 276
    #define WHITE_SPACE 277
    %}
    
    digit   [0-9]
    blank   [ ]
    letter  [A-Za-z]
    name1   [A-Za-z_]
    name2   [A-Za-z_0-9]
    valid_tag_character [A-Za-z0-9"'=@_():/ ] 
    ignore_tags style|script
    %%
    
    {blank}+"\n"                  { return [ WHITE_SPACE, yytext ] } 
    "\n"{blank}+                  { return [ WHITE_SPACE, yytext ] } 
    {blank}+"\n"{blank}+                  { return [ WHITE_SPACE, yytext ] } 
    
    "\r"                  { return [ WHITE_SPACE, yytext ] } 
    "\n"            { return[ yytext[0], yytext[0..0] ] };
    "\t"            { return[ yytext[0], yytext[0..0] ] };
    
    ^{blank}+       { return [ WHITE_SPACE, yytext ] }
    
    {blank}+$       { return [ WHITE_SPACE, yytext ] };
    
    ""   { return [ TAG_NO_TEXT_START, yytext ] }
    ""  { return [ TAG_NO_TEXT_END, yytext ] }
    ""                   { return [ TAG_SELF_CONTAINED, yytext ] }
    ""  { return [ TAG_SELF_CONTAINED, yytext ] }
    ""    { return [ TAG_START, yytext ] }
    ""   { return [ TAG_END, yytext ] }
    
    ""  { return [ ERB_BLOCK_END, yytext ] }
    ""  { return [ ERB_STRING_END, yytext ] }
    
    
    {letter}+       { return [ WORD, yytext ] }
    
    
    \".*\"          { return [ STRING_DOUBLE_QUOTE, yytext ] }
    '.*'                    { return [ STRING_SINGLE_QUOTE, yytext ] }
    .           { return [ yytext[0], yytext[0..0] ] }
    
    %%
    

    This is not a complete grammer but for my purposes, locating and re-emitting text, it worked. I combined that grammer with this small piece of code:

        text_handler = MakeYourOwnCallbackHandler.new
    
        l = Erblex.new
        l.yyin = File.open(file_name, "r")
    
        loop do
          a,v = l.yylex
          break if a == 0
    
          if( a < WORD )
            text_handler.character( v.to_s, a )
          else
            case a
            when WORD
              text_handler.text( v.to_s )
            when TAG_START
              text_handler.start_tag( v.to_s )
            when TAG_END
              text_handler.end_tag( v.to_s )
            when WHITESPACE
              text_handler.white_space( v.to_s )
            when ERB_BLOCK_START
              text_handler.erb_block_start( v.to_s )
            when ERB_BLOCK_END
              text_handler.erb_block_end( v.to_s )      
            when ERB_STRING_START
              text_handler.erb_string_start( v.to_s )
            when ERB_STRING_END
              self.text_handler.erb_string_end( v.to_s )
            when TAG_NO_TEXT_START
              text_handler.ignorable_tag_start( v.to_s )
            when TAG_NO_TEXT_END
              text_handler.ignorable_tag_end( v.to_s )
            when STRING_DOUBLE_QUOTE
              text_handler.string_double_quote( v.to_s )
            when STRING_SINGLE_QUOTE
              text_handler.string_single_quote( v.to_s )
            when TAG_SELF_CONTAINED
              text_handler.tag_self_contained( v.to_s )
            end
          end  
        end
    
    0 讨论(0)
提交回复
热议问题