问题
I'm writing an A.I. to solve a "Maze of Life" puzzle. Attempting to store states to a HashSet
slows everything down. It's faster to run it without a set of explored states. I'm fairly confident my node (state storage) implements equals and hashCode
well as tests show a HashSet
doesn't add duplicate states. I may need to rework the hashCode
function, but I believe what's slowing it down is the HashSet
rehashing and resizing.
I've tried setting the initial capacity to a very large number, but it's still extremely slow:
val initCapacity = java.lang.Math.pow(initialGrid.width*initialGrid.height,3).intValue()
val frontier = new QuickQueue[Node](initCapacity)
Here is the quick queue code:
class QuickQueue[T](capacity: Int) {
val hashSet = new HashSet[T](capacity)
val queue = new Queue[T]
//methods below
For more info, here is the hash function. I store the grid values in bytes in two arrays and access it using tuples:
override def hashCode(): Int = {
var sum = Math.pow(grid.goalCoords._1, grid.goalCoords._2).toInt
for (y <- 0 until grid.height) {
for (x <- 0 until grid.width) {
sum += Math.pow(grid((x, y)).doubleValue(), x.toDouble).toInt
}
sum += Math.pow(sum, y).toInt
}
return sum
}
Any suggestions on how to setup a HashSet
that wont slow things down? Maybe another suggestion of how to remember explored states?
P.S. using java.util.HashSet
, and even with initial capacity set, it takes 80 seconds vs < 7 seconds w/o the set
回答1:
Okay, for a start, please replace
override def hashCode(): Int =
with
override lazy val hashCode: Int =
so you don't calculate (grid.height*grid.width
) floating point powers every time you need to access the hash code. That should speed things up by an enormous amount.
Then, unless you somehow rely upon close cells having close hash codes, don't re-invent the wheel. Use scala.util.hashing.MurmurHash3.seqHash
or somesuch to calculate your hash. This should speed your hash up by another factor of 20 or so. (Still keep the lazy val.)
Then you only have overhead from the required set operations. Right now, unless you have a lot of 0x0 grids, you are using up the overwhelming majority of your time waiting for math.pow to give you a result (and risking everything becoming Double.PositiveInfinity
or 0.0
, depending on how big the values are, which will create hash collisions which will slow things down still further).
回答2:
Note that the following assumes all your objects are immutable. This is a sane assumption when using hashing.
Also you should profile your code before applying optimization (use e.g. the free jvisualvm, that comes with the JDK).
Memoization for fast hashCode
Computing the hash code is usually a bottleneck. By computing the hash code only once for each object and storing the result you can reduce the cost of hash code computation to a minimum (once at object creation) at the expense of increased space consumption (probably moderate). To achieve this turn the def hashCode
into a lazy val
or val
.
Interning for fast equals
Once you have the cost of hashCode
eliminated, computing equals
becomes a problem. equals
is particularly expensive for collection fields and deep structures in general.
You can minimize the cost of equals
by interning. This means that you acquire new objects of the class through a factory method, which checks whether the requested new object already exists, and if so, returns a reference to the existing object. If you assert that every object of this type is constructed in this way you know that there is only one instance of each distinct object and equals
becomes equivalent to object identity, which is a cheap reference comparison (eq
in Scala).
来源:https://stackoverflow.com/questions/14714900/optimal-hashset-initialization-scala-java