HTML tidy/cleaning in Ruby 1.9

喜欢而已 提交于 2019-11-28 07:47:15

http://github.com/libc/tidy_ffi/blob/master/README.rdoc works with ruby 1.9 (latest version)

If you are working on windows, you need to set the library_path eg

    require 'tidy_ffi'
    TidyFFI.library_path = 'lib\\tidy\\bin\\tidy.dll'
    tidy = TidyFFI::Tidy.new('test')
    puts tidy.clean

(It uses the same dll as tidy) The above links gives you more example of the usage.

I am using Nokogiri to fix invalid html:

  Nokogiri::HTML::DocumentFragment.parse(html).to_html

Here is a nice example of how to make your html look better using tidy:

require 'tidy'
Tidy.path = '/opt/local/lib/libtidy.dylib' # or where ever your tidylib resides

nice_html = ""
Tidy.open(:show_warnings=>true) do |tidy|
  tidy.options.output_xhtml = true
  tidy.options.wrap = 0
  tidy.options.indent = 'auto'
  tidy.options.indent_attributes = false
  tidy.options.indent_spaces = 4
  tidy.options.vertical_space = false
  tidy.options.char_encoding = 'utf8'
  nice_html = tidy.clean(my_nasty_html_string)
end

# remove excess newlines
nice_html = nice_html.strip.gsub(/\n+/, "\n")
puts nice_html

For more tidy options, check out the man page.

Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9.

Watch out, the Ruby Tidy bindings have some nasty memory leaks. It's currently unusable in long running processes. (for the record, I'm using http://github.com/ak47/tidy)

I just had to remove it from a production Rails 2.3 application because it was leaking about 1MB/min.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!