问题
How do I deserialize in Psych to return an existing object, such as a class object?
To do serialization of a class, I can do
require "psych"
class Class
yaml_tag 'class'
def encode_with coder
coder.represent_scalar 'class', name
end
end
yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n"
but if I try doing Psych.load
on that, I get an anonymous class, rather than the String class.
The normal deserialization method is Object#init_with(coder)
, but that only changes the state of the existing anonymous class, whereas I'm wanting the String class.
Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o)
has cases where rather than modifying existing objects with init_with
, they make sure the right object is created in the first place (for example, calling Complex(o.value)
to deserialize a complex number), but I don't think I should be monkeypatching that method.
Am I doomed to working with low level or medium level emitting, or am I missing something?
Background
I'll describe the project, why it needs classes, and why it needs (de)serialization.
Project
The Small Eigen Collider aims to create random tasks for Ruby to run. The initial aim was to see if the different implementations of Ruby (for example, Rubinius and JRuby) returned the same results when given the same random tasks, but I've found that it's also good for detecting ways to segfault Rubinius and YARV.
Each task is composed of the following:
receiver.send(method_name, *parameters, &block)
where receiver
is a randomly chosen object, and method_name
is the
name of a randomly chosen method, and *parameters
is an array of
randomly chosen objects. &block
is not very random - it's basically
equivalent to {|o| o.inspect}
.
For example, if receiver were "a", method_name was :casecmp, and parameters was ["b"], then you'd be calling
"a".send(:casecmp, "b") {|x| x.inspect}
which is equivalent to (since the block is irrelevant)
"a".casecmp("b")
the Small Eigen Collider runs this code, and logs these inputs and also the return value. In this example, most implementations of Ruby return -1, but at one stage, Rubinius returned +1. (I filed this as a bug https://github.com/evanphx/rubinius/issues/518 and the Rubinius maintainers fixed the bug)
Why it needs classes
I want to be able to use class objects in my Small Eigen Collider. Typically, they would be the receiver, but they could also be one of the parameters.
For example, I found that one way to segfault YARV is to do
Thread.kill(nil)
In this case, receiver is the class object Thread, and parameters is [nil]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )
Why it needs (de)serialization
The Small Eigen Collider needs serialization for a couple of reasons.
One is that using a random number generator to generate a series of random tasks every time isn't practical. JRuby has a different builtin random number generator, so even when given the same PRNG seed it'd give different tasks to YARV. Instead, what I do is I create a list of random tasks once (the first running of ruby bin/small_eigen_collider), have the initial running serialize the list of tasks to tasks.yml, and then have subsequent runnings of the program (using different Ruby implementations) read in that tasks.yml file to get the list of tasks.
Another reason I need serialization is that I want to be able to edit the list of tasks. If I have a long list of tasks that leads to a segmentation fault, I want to reduce the list to the minimum required to cause a segmentation fault. For example, with the following bug https://github.com/evanphx/rubinius/issues/643 ,
ObjectSpace.undefine_finalizer(:symbol)
by itself doesn't cause a segmentation fault, and nor does
Symbol.all_symbols.inspect
but if you put the two together, it did. But I started out with thousands of tasks, and needed to pare it back to just those two tasks.
Does deserialization returning existing class objects make sense in this context, or do you think there's a better way?
回答1:
Status quo of my current researches:
To get your desired behavior working you can use my workaround mentioned above.
Here the nicely formatted code example:
string_yaml = Psych.dump(Marshal.dump(String))
# => "--- ! \"\\x04\\bc\\vString\"\n"
string_class = Marshal.load(Psych.load(string_yaml))
# => String
Your hack with modifying Class maybe will never work, because real class handling isn't implemented in psych/yaml.
You can take this repo tenderlove/psych, which is the standalone lib.
(Gem: psych - to load it, use: gem 'psych'; require 'psych'
and do a check with Psych::VERSION
)
As you can see in line 249-251 handling of objects with the anonymous class Class isn't handled.
Instead of monkeypatching the class Class I recommend you to contribute to the Psych lib by extending this class handling.
So in my mind the final yaml result should be something like: "--- !ruby/class String"
After one night thinking about that I can say, this feature would be really nice!
Update
Found a tiny solution which seems to work in the intended way:
code gist: gist.github.com/1012130 (with descriptive comments)
回答2:
The Psych maintainer has implemented the serialization and deserialization of classes and modules. It's now in Ruby!
来源:https://stackoverflow.com/questions/5774580/how-do-i-deserialize-classes-in-psych