I have been running some code, a part of which loads in a large 1D numpy array from a binary file, and then alters the array using the numpy.where() method.
Here is
The drops in CPU usage were unrelated to python or numpy, but were indeed a result of reading from a shared disk, and network I/O was the real culprit. For such large arrays, reading into memory can be a major bottleneck.
Did you click or select the Console window? This behavior can "hang" the process. Console enters "QuickEditMode". Pressing any key can resume the process.
np.where is creating a copy there and assigning it back into arr
. So, we could optimize on memory there by avoiding a copying step, like so -
vol_avg = (np.sum(arr) - (arr[arr >= 1.0] - 1.0).sum())/(num**3)
We are using boolean-indexing
to select the elements that are greater than 1.0
and getting their offsets from 1.0
and summing those up and subtracting from the total sum. Hopefully the number of such exceeding elements are less and as such won't incur anymore noticeable memory requirement. I am assuming this hanging up issue with large arrays is a memory based one.