问题
I have a large dataset from an analytics provider.
It arrives in JSON and I parse it into a hash, but due to the size of the set I'm ballooning to over a gig in memory usage. Almost everything starts as strings (a few values are numerical), and while of course the keys are duplicated many times, many of the values are repeated as well.
So I was thinking, why not symbolize all the (non-numerical) values, as well?
I've found some discusion of potential problems, but I figure it would be nice to have a comprehensive description for Ruby, since the problems seem dependent on the implementation of the interning process (what happens when you symbolize a string).
I found this talking about Java: Is it good practice to use java.lang.String.intern()?
- The interning process can be expensive
- Interned strings are never de-allocated, resulting in a memory leak
(Except there's some contention on that last point.)
So, can anyone give a detailed explanation of when not to intern strings in Ruby?
回答1:
- When the list of things in question is an open set (i.e., dynamic, has no fixed inventory), you should not convert them into symbols. Each symbol created will not be garbage collected, and will cause memory leak.
- When the list of things in question is a closed set (i.e., static, has a fixed inventory), you should better convert them into symbols. Each symbol will be created only once, and will be reused. That will save memory.
回答2:
The interning process can be expensive
there is always a tradeoff between memory and computing power we have to choose. so try some best practices out there and benchmark to figure out what's right for you. a few suggestions I like to mention..
symbols are an excellent choice for a hash key
{name: "my name"}
Freeze Strings to save memory, try to keep a small string pool
person[:country] = "USA".freeze
have fun with Ruby GC tuning.
Interned strings are never de-allocated, resulting in a memory leak
- ruby 2.2 introduced a symbol garbage collection. so this concern is no longer valid. however, overuse of frozen strings and symbols will decrease the performance.
来源:https://stackoverflow.com/questions/16289455/when-not-to-use-to-sym-in-ruby