Nokogiri builder performance on huge XML?

放肆的年华 提交于 2019-12-11 01:09:52

问题


I need to build a huge XML file, about 1-50 MB. I thought that using builder would be effective enough and, well it is, somewhat. The problem is, after the program reaches its last line it doesn't end immediately, but Ruby is still doing something for several seconds, maybe garbage collection? After that the program finally ends.

To give a real example, I am measured the time of building an XML file. It outputs 55 seconds (there is a database behind so it takes long) when the XML was built, but Ruby still processes for about 15 more seconds and the processor is going crazy.

The pseudo/real code is as follows:

...
builder = Nokogiri::XML::Builder.with(doc) do |xml|
  build_node(xml)
end
...

def build_node(xml)
  ...
  xml["#{namespace}"] if namespace  
  xml.send("#{elem_name}", attrs_hash) do |elem_xml|
  ...
    if has_children
      if type
        case type
          when XML::TextContent::PLAIN
            elem_xml.text text_content
          when XML::TextContent::COMMENT
            elem_xml.comment text_content
          when XML::TextContent::CDATA
            elem_xml.cdata text_content
         end
       else
         build_node(elem_xml)
       end
    end
  end
end

Note that I was using a different approach using my own structure of classes, and the speed of the build was the same, but at the last line the program normally ended, but now I am forced to use Nokogiri so I have to find a solution.

What I can do to avoid that X seconds long overhead after the XML is built? Is it even possible?

UPDATE:

Thanks to a suggestion from Adiel Mittmann, during the creation of my minimal working example I was able to locate the problem. I now have a small (well not that small) example demonstrating the problem.

The following code is causing the problem:

xml.send("#{elem_name}_") do |elem_xml|
  ...
  elem_xml.text text_content #This line is the problem
  ...
end

So the line executes the following code based on Nokogiri's documentation:

def create_text_node string, &block
  Nokogiri::XML::Text.new string.to_s, self, &block
end

Text node creation code gets executed then. So, what exactly is happening here?

UPDATE 2:

After some other tries, the problem can be easily reproduced by:

builder = Nokogiri::XML::Builder.new do |xml|
  0.upto(81900) do
    xml.text "test"
  end
end
puts "End"

So is it really Nokogiri itself? Is there any option for me?


回答1:


Your example also takes a long time to execute here. And you were right: it's the garbage collector that's taking so long to execute. Try this:

require 'nokogiri'
class A
  def a
    builder = Nokogiri::XML::Builder.new do |xml|
      0.upto(81900) do
        xml.text "test"
      end
    end
  end
end
A.new.a
puts "End1"
GC.start
puts "End2"

Here, the delay happens between "End1" and "End2". After "End2" is printed, the program closes immediately.

Notice that I created an object to demonstrate it. Otherwise, the data generated by the builder can only be garbage collected when the program finishes.

As for the best way to do what you're trying to accomplish, I suggest you ask another question giving details of what exactly you're trying to do with the XML files.




回答2:


Try using the Ruby built-in (sic) Builder. I use it to generate large XML files as well, and it has such an small footprint.



来源:https://stackoverflow.com/questions/9731089/nokogiri-builder-performance-on-huge-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!