I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic con
I recently had a similar problem. The approach that I took was to write a small script (erblint.rb) do a string substitution to convert the ERB tags (<% %>
and <%= %>
) to XML tags, and then parse using Nokogiri.
See the following code to see what I mean:
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
# This is a simple program that reads in a Ruby ERB file, and parses
# it as an XHTML file. Specifically, it makes a decent attempt at
# converting the ERB tags (<% %> and <%= %>) to XML tags (
# and respectively.
#
# Once the document has been parsed, it will be validated and any
# error messages will be displayed.
#
# More complex option and error handling is left as an exercise to the user.
abort 'Usage: erb.rb ' if ARGV.empty?
filename = ARGV[0]
begin
doc = ""
File.open(filename) do |file|
puts "\n*** Parsing #{filename} ***\n\n"
file.read(nil, s = "")
# Substitute the standard ERB tags to convert them to XML tags
# <%= ... %> for ...
# <% ... %> for ...
#
# Note that this won't work for more complex expressions such as:
# >link text
# Of course, this is not great style, anyway...
s.gsub!(/<%=(.+?)%>/m, '\1 ')
s.gsub!(/<%(.+?)%>/m, '\1 ')
doc = Nokogiri::XML(s) do |config|
# put more config options here if required
# config.strict
end
end
puts doc.to_xhtml(:indent => 2, :encoding => 'UTF-8')
puts "Huzzah, no errors!" if doc.errors.empty?
# Otherwise, print each error message
doc.errors.each { |e| puts "Error at line #{e.line}: #{e}" }
rescue
puts "Oops! Cannot open #{filename}"
end
I've posted this as a gist on Github: https://gist.github.com/787145