I am building a large data dictionary from a set of text files. As I read in the lines and process them, I append(dataline)
to a list.
At some point th
I had a similar problem using a 32-bit version of python in a 64-bit windows environment. I tried the 64-bit windows version of python and very quickly ran into troubles with the Scipy libraries compiled for 64-bit windows.
The totally free solution that I implemented was
1) Install VirtualBox
2) Install CentOS 5.6 on the VM
3) Get the Enthought Python Distribution (Free 64 bit Linux Version).
Now all of my Numpy, Scipy, and Matplotlib dependant python code can use as much memory as I have Ram and available Linux swap.
If you're using a 32-bit build of Python, you might want to try a 64-bit version.
It is possible for a process to address at most 4GB of RAM using 32-bit addresses, but typically (depending on the OS), one gets much less. It sounds like your Python process may be hitting this limit. 64-bit addressing removes this limitation.
edit Since you're asking about Windows, the following page is of relevance: Memory Limits for Windows Releases. As you can see, the limit per 32-bit process is 2, 3 or 4GB depending on the OS version and configuration.
As its been already mentioned, you'll need a python64 bit (of a 64-bit version of windows).
Be aware that you'll probably face a lot of conflicts and problems with some of the basic packages you might want to work with. to avoid this problem I'd recommend Anaconda from Continuum Analytics. I'd advice you to look into it :)
I had a similar problem happening when evaluating an expression containing large numpy
arrays (actually, one was sparse). I was doing this on a machine with 64GB of memory, of which only about 8GB was in use, so was surprised to get the MemoryError
.
It turned out that my problem was array shape broadcasting: I had inadvertently duplicated a large dimension.
It went something like this:
(286577, 1)
where I was expecting (286577)
. (286577, 130)
. (286577)
, I applied [:,newaxis]
in the expression to bring it to (286577,1)
so it would be broadcast to (286577,130)
. (286577,1)
however, [:,newaxis]
produced shape (286577,1,1)
and the two arrays were both broadcast to shape (286577,286577,130)
... of doubles. With two such arrays, that comes to about 80GB!If you're open to restructuring the code instead of throwing more memory at it, you might be able to get by with this:
data = (processraw(raw) for raw in lines)
where lines
is either a list of lines or file.xreadlines()
or similar.