I am trying to insert about 50,000 objects (and therefore 50,000 keys) into a java.util.HashMap
. However, I keep getting an OutOfMemo
Some people are suggesting changing the parameters of the HashMap to tighten up the memory requirements. I would suggest to measure instead of guessing; it might be something else causing the OOME. In particular, I'd suggest using either NetBeans Profiler or VisualVM (which comes with Java 6, but I see you're stuck with Java 5).
You can increase the maximum heap size by passing -Xmx128m (where 128 is the number of megabytes) to java. I can't remember the default size, but it strikes me that it was something rather small.
You can programmatically check how much memory is available by using the Runtime class.
// Get current size of heap in bytes
long heapSize = Runtime.getRuntime().totalMemory();
// Get maximum size of heap in bytes. The heap cannot grow beyond this size.
// Any attempt will result in an OutOfMemoryException.
long heapMaxSize = Runtime.getRuntime().maxMemory();
// Get amount of free memory within the heap in bytes. This size will increase
// after garbage collection and decrease as new objects are created.
long heapFreeSize = Runtime.getRuntime().freeMemory();
(Example from Java Developers Almanac)
This is also partially addressed in Frequently Asked Questions About the Java HotSpot VM, and in the Java 6 GC Tuning page.
Implicit in these answers it that Java has a fixed size for memory and doesn't grow beyond the configured maximum heap size. This is unlike, say, C, where it's constrained only by the machine on which it's being run.
The implementations are backed by arrays usually. Arrays are fixed size blocks of memory. The hashmap implementation starts by storing data in one of these arrays at a given capacity, say 100 objects.
If it fills up the array and you keep adding objects the map needs to secretly increase its array size. Since arrays are fixed, it does this by creating an entirely new array, in memory, along with the current array, that is slightly larger. This is referred to as growing the array. Then all the items from the old array are copied into the new array and the old array is dereferenced with the hope it will be garbage collected and the memory freed at some point.
Usually the code that increases the capacity of the map by copying items into a larger array is the cause of such a problem. There are "dumb" implementations and smart ones that use a growth or load factor that determines the size of the new array based on the size of the old array. Some implementations hide these parameters and some do not so you cannot always set them. The problem is that when you cannot set it, it chooses some default load factor, like 2. So the new array is twice the size of the old. Now your supposedly 50k map has a backing array of 100k.
Look to see if you can reduce the load factor down to 0.25 or something. this causes more hash map collisions which hurts performance but you are hitting a memory bottleneck and need to do so.
Use this constructor:
(http://java.sun.com/javase/6/docs/api/java/util/HashMap.html#HashMap(int, float))
Another thing to try if you know the number of objects beforehand is to use the HashMap(int capacity,double loadfactor) constructor instead of the default no-arg one which uses defaults of (16,0.75). If the number of elements in your HashMap exceeds (capacity * loadfactor) then the underlying array in the HashMap will be resized to the next power of 2 and the table will be rehashed. This array also requires a contiguous area of memory so for example if you're doubling from a 32768 to a 65536 size array you'll need a 256kB chunk of memory free. To avoid the extra allocation and rehashing penalties, just use a larger hash table from the start. It'll also decrease the chance that you won't have a contiguous area of memory large enough to fit the map.
The Java heap space is limited by default, but that still sounds extreme (though how big are your 50000 segments?)
I am suspecting that you have some other problem, like the arrays in the set growing too big because everything gets assigned into the same "slot" (also affects performance, of course). However, that seems unlikely if your points are uniformly distributed.
I'm wondering though why you're using a HashMap rather than a TreeMap? Even though points are two dimensional, you could subclass them with a compare function and then do log(n) lookups.