I\'m running up against the 2gb object limit in c# (this applies even in 64 bit for some annoying reason) with a large collection of structs (est. size of 4.2 gig in total).
There's an interesting post around this subject here:
http://blogs.msdn.com/b/joshwil/archive/2005/08/10/450202.aspx
Which talks about writing your own 'BigArray' object.
In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string
is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.
The List
holds references which are 4 or 8 bytes, depending on if you're running in 32-bit or 64-bit mode, therefore if you reference a 2GB object that would not increase the actual List
size to 2 GB but it would only increase it by the number of bytes it is necessary to reference that object.
This will allow you to reference millions of objects and each object could be 2GB. If you have 4 objects in the List
and each is 2 GB, then you would have 8 GB worth of objects referenced by the List
, but the List
object would have only used up an extra 4*8=32 bytes.
The number of references you can hold on a 32-bit machine before the List
hits the 2GB limit is 536.87 million, on a 64-bit machine it's 268.43 million.
536 million references * 2 GB = A LOT OF DATA!
P.S. Reed pointed out, the above is true for reference types but not for value types. So if you're holding value types, then your workaround is valid. Please see the comment below for more info.
Now obviously using List is going to give me a list of size 4.2gb give or take, but would using a list made up of smaller lists, which in turn contain a portion of the structs, allow me to jump this limit?
Yes - though, if you're trying to work around this limit, I'd consider using arrays yourself instead of letting the List<T>
class manage the array.
The 2gb single object limit in the CLR is exactly that, a single object instance. When you make an array of a struct (which List<T>
uses internally), the entire array is "one object instance" in the CLR. However, by using a List<List<T>>
or a jagged array, each internal list/array is a separate object, which allows you to effectively have any size object you wish.
The CLR team actually blogged about this, and provided a sample BigArray<T> implementation that acts like a single List<T>
, but does the "block" management internally for you. This is another option for getting >2gb lists.
Note that .NET 4.5 will have the option to provide larger than 2gb objects on x64, but it will be something you have to explicitly opt in to having.
class HugeList<T>
{
private const int PAGE_SIZE = 102400;
private const int ALLOC_STEP = 1024;
private T[][] _rowIndexes;
private int _currentPage = -1;
private int _nextItemIndex = PAGE_SIZE;
private int _pageCount = 0;
private int _itemCount = 0;
#region Internals
private void AddPage()
{
if (++_currentPage == _pageCount)
ExtendPages();
_rowIndexes[_currentPage] = new T[PAGE_SIZE];
_nextItemIndex = 0;
}
private void ExtendPages()
{
if (_rowIndexes == null)
{
_rowIndexes = new T[ALLOC_STEP][];
}
else
{
T[][] rowIndexes = new T[_rowIndexes.Length + ALLOC_STEP][];
Array.Copy(_rowIndexes, rowIndexes, _rowIndexes.Length);
_rowIndexes = rowIndexes;
}
_pageCount = _rowIndexes.Length;
}
#endregion Internals
#region Public
public int Count
{
get { return _itemCount; }
}
public void Add(T item)
{
if (_nextItemIndex == PAGE_SIZE)
AddPage();
_itemCount++;
_rowIndexes[_currentPage][_nextItemIndex++] = item;
}
public T this[int index]
{
get { return _rowIndexes[index / PAGE_SIZE][index % PAGE_SIZE]; }
set { _rowIndexes[index / PAGE_SIZE][index % PAGE_SIZE] = value; }
}
#endregion Public
}