问题

I'm writing an A.I. to solve a "Maze of Life" puzzle. Attempting to store states to a HashSet slows everything down. It's faster to run it without a set of explored states. I'm fairly confident my node (state storage) implements equals and hashCode well as tests show a HashSet doesn't add duplicate states. I may need to rework the hashCode function, but I believe what's slowing it down is the HashSet rehashing and resizing.

I've tried setting the initial capacity to a very large number, but it's still extremely slow:

 val initCapacity = java.lang.Math.pow(initialGrid.width*initialGrid.height,3).intValue()
 val frontier = new QuickQueue[Node](initCapacity)

Here is the quick queue code:

class QuickQueue[T](capacity: Int) {

val hashSet = new HashSet[T](capacity)
val queue = new Queue[T]
    //methods below

For more info, here is the hash function. I store the grid values in bytes in two arrays and access it using tuples:

override def hashCode(): Int = {
  var sum = Math.pow(grid.goalCoords._1, grid.goalCoords._2).toInt
  for (y <- 0 until grid.height) {
     for (x <- 0 until grid.width) {
        sum += Math.pow(grid((x, y)).doubleValue(), x.toDouble).toInt
     }
     sum += Math.pow(sum, y).toInt
  }
  return sum
}

Any suggestions on how to setup a HashSet that wont slow things down? Maybe another suggestion of how to remember explored states?

P.S. using java.util.HashSet, and even with initial capacity set, it takes 80 seconds vs < 7 seconds w/o the set

回答1:

Okay, for a start, please replace

override def hashCode(): Int =

with

override lazy val hashCode: Int =

so you don't calculate (grid.height*grid.width) floating point powers every time you need to access the hash code. That should speed things up by an enormous amount.

Then, unless you somehow rely upon close cells having close hash codes, don't re-invent the wheel. Use scala.util.hashing.MurmurHash3.seqHash or somesuch to calculate your hash. This should speed your hash up by another factor of 20 or so. (Still keep the lazy val.)

Then you only have overhead from the required set operations. Right now, unless you have a lot of 0x0 grids, you are using up the overwhelming majority of your time waiting for math.pow to give you a result (and risking everything becoming Double.PositiveInfinity or 0.0, depending on how big the values are, which will create hash collisions which will slow things down still further).

回答2:

Note that the following assumes all your objects are immutable. This is a sane assumption when using hashing.

Also you should profile your code before applying optimization (use e.g. the free jvisualvm, that comes with the JDK).

Memoization for fast `hashCode`

Computing the hash code is usually a bottleneck. By computing the hash code only once for each object and storing the result you can reduce the cost of hash code computation to a minimum (once at object creation) at the expense of increased space consumption (probably moderate). To achieve this turn the def hashCode into a lazy val or val.

Interning for fast `equals`

Once you have the cost of hashCode eliminated, computing equals becomes a problem. equals is particularly expensive for collection fields and deep structures in general.

You can minimize the cost of equals by interning. This means that you acquire new objects of the class through a factory method, which checks whether the requested new object already exists, and if so, returns a reference to the existing object. If you assert that every object of this type is constructed in this way you know that there is only one instance of each distinct object and equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).

来源：https://stackoverflow.com/questions/14714900/optimal-hashset-initialization-scala-java

标签

scala

optimization

hashset

Optimal HashSet Initialization (Scala | Java)

问题

回答1:

回答2:

Memoization for fast hashCode

Interning for fast equals

Memoization for fast `hashCode`

Interning for fast `equals`