I am using Dictionary and I need to store almost 13 000 000 keys in it. Unfortunatelly, after adding 11 950 000th key I got an exception \"System out of memory\". Is there a
C# is not a language that was designed to solve heavy-duty scientific computation problems. It is absolutely possible to use C# to build tools that do what you want, but the off-the-shelf parts like Dictionary were designed to solve more common business problems, like mapping zip codes to cities and that sort of thing.
You're going to have to go with external storage of some sort. My recommendation would be to buy a database and use it to store your data. Then use a DataSet or some similar technology to load portions of the data into memory, manipulate them, and then pour more data from the database into the DataSet, and so on.
Well, I had almost exactly the same problem.
I wanted to load about 12.5 million [string, int]s into a dictionary from a database (for all the programming "gods" above who don't understand why, the answer is that it is enormously quicker when you are working with a 150 GB database if you can cache a proportion of one of the key tables in memory).
It annoyingly threw an out of memory exception at pretty much the same place - just under the 12 million mark even though the process was only consuming about 1.3 GB of memory (reduced to about 800 MB of memory after a judicious change in db read method to not try and do it all at once) - despite running on an I7 with 8 GB of memory.
The solution was actually remarkably simple - in Visual Studio (2010) in Solution Explorer right click the project and select properties. In the Build tab set Platform Target to x64 and rebuild.
It rattles through the load into the Dictionary in a few seconds and the Dictionary performance is very good.
It is not the problem with the Dictionary object, but the available memory in your server. I've done some investigation to understand the failures of dictionary object, but it never failed. Below is the code for your reference
private static void TestDictionaryLimit()
{
int intCnt = 0;
Dictionary<long, string> dItems = new Dictionary<long, string>();
Console.WriteLine("Total number of iterations = {0}", long.MaxValue);
Console.WriteLine("....");
for (long lngCnt = 0; lngCnt < long.MaxValue; lngCnt++)
{
if (lngCnt < 11950020)
dItems.Add(lngCnt, lngCnt.ToString());
else
break;
if ((lngCnt % 100000).Equals(0))
Console.Write(intCnt++);
}
Console.WriteLine("Completed..");
Console.WriteLine("{0} number of items in dictionary", dItems.Count);
}
The above code executes properly, and stores more than the number of count that you have mentioned.
Buy more memory, install a 64 bit version of the OS and recompile for 64 bits. No, I'm not kidding. If you want so many objects... in ram... And then call it a "feature". If the new Android can require 16gb of memory to be compiled...
I was forgetting... You could begin by reading C# array of objects, very large, looking for a better way
You know how many are 13 million objects?
To make a comparison, a 32 bits Windows app has access to less than 2 gb of address space. So it's 2 billion bytes (give or take)... 2 billion / 13 million = something around 150 bytes/object. Now, if we consider how much a reference type occupies... It's quite easy to eat 150 bytes.
I'll add something: I've looked in my Magic 8-Ball
and it told me: show us your code. If you don't tell us what you are using for the key and the values, how should we be able to help you? What are you using, class
or struct
or "primitive" types? Tell us the "size" of your TKey
and TValue
. Sadly our crystall ball broke yesterday :-)
Really 13000000 items are quite a lot. If 13000000 are allocated classes is a very deep kick into garbage collector stomach!
Also if you find a way to use the default .NET dictionary, the performance would be really bad, too much keys, the number of keys approaches the number of values a 31 bit hash can use, performance will be awful in whatever system you use, and of course, memory will be too much!
If you need a data structure that can use more memory than an hash table you probably need a custom hashtable mixed with a custom binary tree data structure. Yes, it is possible to write your own combination of two.
You cannot rely on .net hashtable for sure for this so strange and specific problem.
Consider that a tree have a lookup complexity of O(log n), while a building complexity of O(n * log n), of course, building it will be too long. You should then build an hashtable of binary trees (or viceversa) that will allow you to use both data structures consuming less memory.
Then, think about compiling it in 32 bit mode, not in 64 bit mode: 64 bit mode uses more memory for pointers. In the same time it i spossible the contrary, 32 bit address space may be is not sufficient for your problem. It never happened to me to have a problem that can run out 32 bit address space!
If both keys and values are simple value types i would suggest you to write your data structure in a C dll and use it through C#.
You can try to write a dictionary of dictionaries. Let's say, you can split your data into chunks of 500000 items between 26 dictionaries for example, but the occupied memory would be very very big, don't think your system will handle it.
public class MySuperDictionary
{
private readonly Dictionary<KEY, VALUE>[] dictionaries;
public MySuperDictionary()
{
this.dictionaries = new Dictionary<KEY, VALUE>[373]; // must be a prime number.
for (int i = 0; i < dictionaries.Length; ++i)
dictionaries[i] = new Dicionary<KEY, VALUE>(13000000 / dictionaries.Length);
}
public void Add(KEY key, VALUE value)
{
int bucket = (GetSecondaryHashCode(key) & 0x7FFFFFFF) % dictionaries.Length;
dictionaries[bucket].Add(key, value);
}
public bool Remove(KEY key)
{
int bucket = (GetSecondaryHashCode(key) & 0x7FFFFFFF) % dictionaries.Length;
return dictionaries[bucket].Remove(key);
}
public bool TryGetValue(KEY key, out VALUE result)
{
int bucket = (GetSecondaryHashCode(key) & 0x7FFFFFFF) % dictionaries.Length;
return dictionaries[bucket].TryGetValue(key, out result);
}
public static int GetSecondaryHashCode(KEY key)
{
here you should return an hash code for key possibly using a different hashing algorithm than the algorithm you use in inner dictionaries
}
}
I think that you need a new approach to your processing.
I must assume that you obtain the data from a file or a database, either way that is where it should remain.
There is no way that you may actually increase the limit on the number of values stored within a Dictionary, other than increasing system memory, but eitherway it is an extremely inefficient means of processing such a alarge amount of data.
You should rethink your algorithm so that you can process the data in more manageable portions. It will mean processing it in stages until you get your result. This may mean many hundreeds of passes through the data, but it's the only way to do it.
I would also suggest that you look at using generics to help speed up this repetitive processing and cut down on memory usage.
Remember that there will still be a balancing act between system performance and access to externally stored data (be it external disk store or database).