Basically, what is the best way to go about storing and using dense matrices in python?
I have a project that generates similarity metrics between every item in an array
For the 20,000 x 20,000 you are looking at 12GB of RAM?
Aren't you going to end up in swap hell trying to work with a 12GB in win32 which artificially limits the memory that the OS can address?
I'd be looking for an OS that can support the 12GB (32 bin win 2003 server can if you need to stick with 32bit windows), but a 64 bit machine with 64bit OS and 16GB of RAM would seem a better fit.
Good excuse for an upgrade :)
64 bit numpy can support your matrix
Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.zeros((20000,20000),dtype=np.uint16)
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint16)