I am using a Dictionary
to store the frequency of colors in an image, where the key is the the color (as an int), and the value is the number of
Each dictionary entry holds two 4-byte integers: 8 bytes total. 8 bytes * 6 millions entries is only about 48MB, +/- some space for object overhead, alignment, etc. There's plenty of space in memory for this. .Net provides virtual address space of up to 2 GB per process. 48MB or so shouldn't cause a problem.
I expect what's actually happening here is related to how the dictionary auto-expands and how the garbage collector handles (or doesn't handle) compaction.
First, the auto-expanding part. Last time I checked (back around .Net 2.0*), collections in .Net tended to use arrays internally. They would allocated a reasonably-sized array in the collection constructor (say, 10 items), and then use a doubling algorithm to create additional space whenever the array filled up. All the existing items would have to be copied to the new array, but then the old array could be garbage collected. The garbage collector is pretty reliable about this, and so it means you're left using space for at most 2n - 1 items in the collection.
Now the Garbage Collector compaction part. After a certain size, these arrays end up in a section of memory called the Large Object Heap. Garbage Collection still works here (though less often). What doesn't really work here well is compaction (think memory defragmentation). The physical memory used by the old object will be released, returned to the operating system, and available for other processes. However, the virtual address space in your process... the table that maps program memory offsets to physical memory addresses, will still have the (empty) space reserved.
This is important, because remember: we're working with a rapidly growing object. It's possible for such an object to take up address space far larger than the final size of the object itself. An object grows enough, fast enough, and suddenly you get an OutOfMemoryException, even though your app isn't really using all that much RAM.
The first solution here is allocate enough space in the initial collection for all of your data. This allows you to skip all those re-allocations and copying. Your data will live in a single array, and use only the space you actually asked for. Most collections, including the Dictionary, have an overload for the constructor that allows you to give it the number of items you want the first array to use. Be careful here: you don't need to allocate an item for every pixel in your image. There will be a lot of repeated colors. You only need to allocate enough to have space for each color in your image. If it's only large images that give you problems, and you're almost handling them with six millions records, you might find that 8 million is plenty.
My next suggestion is to group your pixel colors. A human can't tell and doesn't care if two colors are just one bit apart in any of the rgb components. You might go as far as to look at the separate RGB values for each pixel and normalize the pixel so that you only care about changes of more than 5 or so for an R,G,or B value. That would get you from 16.5 million potential colors all the way down to only about 132,000, and the data will likely be more useful, too. That might look something like this:
var colorCounts = new Dictionary<Color, int>(132651);
foreach(Color c in GetImagePixels().Select( c=> Color.FromArgb( (c.R/5) * 5, (c.G/5) * 5, (c.B/5) * 5) )
{
colorCounts[c] += 1;
}
* IIRC, somewhere in a recent or upcoming version of .Net both of these issues are being addressed. One by allowing you to force compaction of the LOH, and the other by using a set of arrays for collection backing stores, rather than trying to keep everything in one big array
Try using an array instead. I doubt it will run out of memory. 6 million int array elements is not a big deal.
In the 32 bit runtime, the maximum number of items you can have in a Dictionary<int, int>
is in the neighborhood of 61.7 million. See my old article for more info.
If you're running in 32 bit mode, then your entire application plus whatever bits of ASP.NET and the underlying machinery is required all have to fit within the memory available to your process: normally 2 GB in the 32-bit runtime.
By the way, a really wacky way to solve your problem (but one I wouldn't recommend unless you're really hurting for memory), would be the following (assuming a 24-bit image):
LockBits
to get a pointer to the raw image dataint[count,2]
to hold the values and their occurrence counts.I wouldn't honestly suggest using this method. Just got a little laugh when I thought of it.
The maximum size limit provided by CLR is 2GB
When you run a 64-bit managed application on a 64-bit Windows operating system, you can create an object of no more than 2 gigabytes (GB).
You may better use an array.
You may also check this BigArray<T>, getting around the 2GB array size limit
Update: Given the OP's sample image, it seems that the maximum number of items would be over 16 million, and apparently even that is too much to allocate when instantiating the dictionary. I see three options here:
Previous answer: the problem is that you don't allocate enough space for your dictionary. At some point, when it is expanding, you just run out of memory for the expansion, but not necessarily for the new dictionary.
Example: this code runs out of memory at nearly 24 million entries (in my machine, running in 32-bit mode):
Dictionary<int, int> count = new Dictionary<int, int>();
for (int i = 0; ; i++)
count.Add(i, i);
because with the last expansion it is currently using space for the entries already there, and tries to allocate new space for another so many million more, and that is too much.
Now, if we initially allocate space for, say, 40 million entries, it runs without problem:
Dictionary<int, int> count = new Dictionary<int, int>(40000000);
So try to indicate how many entries there will be when creating the dictionary.
From MSDN:
The capacity of a Dictionary is the number of elements that can be added to the Dictionary before resizing is necessary. As elements are added to a Dictionary, the capacity is automatically increased as required by reallocating the internal array. If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary.