问题
I need to build a huge XML file, about 1-50 MB. I thought that using builder would be effective enough and, well it is, somewhat. The problem is, after the program reaches its last line it doesn't end immediately, but Ruby is still doing something for several seconds, maybe garbage collection? After that the program finally ends.
To give a real example, I am measured the time of building an XML file. It outputs 55 seconds (there is a database behind so it takes long) when the XML was built, but Ruby still processes for about 15 more seconds and the processor is going crazy.
The pseudo/real code is as follows:
...
builder = Nokogiri::XML::Builder.with(doc) do |xml|
build_node(xml)
end
...
def build_node(xml)
...
xml["#{namespace}"] if namespace
xml.send("#{elem_name}", attrs_hash) do |elem_xml|
...
if has_children
if type
case type
when XML::TextContent::PLAIN
elem_xml.text text_content
when XML::TextContent::COMMENT
elem_xml.comment text_content
when XML::TextContent::CDATA
elem_xml.cdata text_content
end
else
build_node(elem_xml)
end
end
end
end
Note that I was using a different approach using my own structure of classes, and the speed of the build was the same, but at the last line the program normally ended, but now I am forced to use Nokogiri so I have to find a solution.
What I can do to avoid that X seconds long overhead after the XML is built? Is it even possible?
UPDATE:
Thanks to a suggestion from Adiel Mittmann, during the creation of my minimal working example I was able to locate the problem. I now have a small (well not that small) example demonstrating the problem.
The following code is causing the problem:
xml.send("#{elem_name}_") do |elem_xml|
...
elem_xml.text text_content #This line is the problem
...
end
So the line executes the following code based on Nokogiri's documentation:
def create_text_node string, &block
Nokogiri::XML::Text.new string.to_s, self, &block
end
Text node creation code gets executed then. So, what exactly is happening here?
UPDATE 2:
After some other tries, the problem can be easily reproduced by:
builder = Nokogiri::XML::Builder.new do |xml|
0.upto(81900) do
xml.text "test"
end
end
puts "End"
So is it really Nokogiri itself? Is there any option for me?
回答1:
Your example also takes a long time to execute here. And you were right: it's the garbage collector that's taking so long to execute. Try this:
require 'nokogiri'
class A
def a
builder = Nokogiri::XML::Builder.new do |xml|
0.upto(81900) do
xml.text "test"
end
end
end
end
A.new.a
puts "End1"
GC.start
puts "End2"
Here, the delay happens between "End1"
and "End2"
. After "End2"
is printed, the program closes immediately.
Notice that I created an object to demonstrate it. Otherwise, the data generated by the builder can only be garbage collected when the program finishes.
As for the best way to do what you're trying to accomplish, I suggest you ask another question giving details of what exactly you're trying to do with the XML files.
回答2:
Try using the Ruby built-in (sic) Builder. I use it to generate large XML files as well, and it has such an small footprint.
来源:https://stackoverflow.com/questions/9731089/nokogiri-builder-performance-on-huge-xml