问题
I am trying to plot a very big file (~5 GB) using python and matplotlib. I am able to load the whole file in memory (the total available in the machine is 16 GB) but when I plot it using simple imshow I get a segmentation fault. This is most probable to the ulimit which I have set to 15000 but I cannot set higher. I have come to the conclusion that I need to plot my array in batches and therefore made a simple code to do that. My main isue is that when I plot a batch of the big array the x coordinates start always from 0 and there is no way I can overlay the images to create a final big one. If you have any suggestion please let me know. Also I am not able to install new packages like "Image" on this machine due to administrative rights. Here is a sample of the code that reads the first 12 lines of my array and make 3 plots.
import os
import sys
import scipy
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.cm as cm
from optparse import OptionParser
from scipy import fftpack
from scipy.fftpack import *
from cmath import *
from pylab import *
import pp
import fileinput
import matplotlib.pylab as plt
import pickle
def readalllines(file1,rows,freqs):
file = open(file1,'r')
sizer = int(rows*freqs)
i = 0
q = np.zeros(sizer,'float')
for i in range(rows*freqs):
s =file.readline()
s = s.split()
#print s[4],q[i]
q[i] = float(s[4])
if i%262144 == 0:
print '\r ',int(i*100.0/(337*262144)),' percent complete',
i += 1
file.close()
return q
parser = OptionParser()
parser.add_option('-f',dest="filename",help="Read dynamic spectrum from FILE",metavar="FILE")
parser.add_option('-t',dest="dtime",help="The time integration used in seconds, default 10",default=10)
parser.add_option('-n',dest="dfreq",help="The bandwidth of each frequency channel in Hz",default=11.92092896)
parser.add_option('-w',dest="reduce",help="The chuncker divider in frequency channels, integer default 16",default=16)
(opts,args) = parser.parse_args()
rows=12
freqs = 262144
file1 = opts.filename
s = readalllines(file1,rows,freqs)
s = np.reshape(s,(rows,freqs))
s = s.T
print s.shape
#raw_input()
#s_shift = scipy.fftpack.fftshift(s)
#fig = plt.figure()
#fig.patch.set_alpha(0.0)
#axes = plt.axes()
#axes.patch.set_alpha(0.0)
###plt.ylim(0,8)
plt.ion()
i = 0
for o in range(0,rows,4):
fig = plt.figure()
#plt.clf()
plt.imshow(s[:,o:o+4],interpolation='nearest',aspect='auto', cmap=cm.gray_r, origin='lower')
if o == 0:
axis([0,rows,0,freqs])
fdf, fdff = xticks()
print fdf
xticks(fdf+o)
print xticks()
#axis([o,o+4,0,freqs])
plt.draw()
#w, h = fig.canvas.get_width_height()
#buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
#buf.shape = (w,h,4)
#buf = np.rol(buf, 3, axis=2)
#w,h,_ = buf.shape
#img = Image.fromstring("RGBA", (w,h),buf.tostring())
#if prev:
# prev.paste(img)
# del prev
#prev = img
i += 1
pl.colorbar()
pl.show()
回答1:
I think you're just missing the extent=(left, right, bottom, top)
keyword argument in plt.imshow
.
x = np.random.randn(2, 10)
y = np.ones((4, 10))
x[0] = 0 # To make it clear which side is up, etc
y[0] = -1
plt.imshow(x, extent=(0, 10, 0, 2))
plt.imshow(y, extent=(0, 10, 2, 6))
# This is necessary, else the plot gets scaled and only shows the last array
plt.ylim(0, 6)
plt.colorbar()
plt.show()
回答2:
If you plot any array with more than ~2k pixels across something in your graphics chain will down sample the image in some way to display it on your monitor. I would recommend down sampling in a controlled way, something like
data = convert_raw_data_to_fft(args) # make sure data is row major
def ds_decimate(row,step = 100):
return row[::step]
def ds_sum(row,step):
return np.sum(row[:step*(len(row)//step)].reshape(-1,step),1)
# as per suggestion from tom10 in comments
def ds_max(row,step):
return np.max(row[:step*(len(row)//step)].reshape(-1,step),1)
data_plotable = [ds_sum(d) for d in data] # plug in which ever function you want
or interpolation.
回答3:
Matplotlib is pretty memory-inefficient when plotting images. It creates several full-resolution intermediate arrays, which is probably why your program is crashing.
One solution is to downsample the image before feeding it into matplotlib, as @tcaswell suggests.
I also wrote some wrapper code to do this downsampling automatically, based on your screen resolution. It's at https://github.com/ChrisBeaumont/mpl-modest-image, if it's useful. It also has the advantage that the image is resampled on the fly, so you can still pan and zoom without sacrificing resolution where you need it.
来源:https://stackoverflow.com/questions/13183460/plot-really-big-file-in-python-5gb-with-x-axis-offset