I have a medium-sized array (e.g. 1500x3000) that I want to plot at scale since it is an image. However, the vertical and horizontal scales are very different. For simplific
Firstly, when you're saving as a .pdf
, you are implicitly using the pdf backend, even though you might be specifying other backends in your options. This means your image is saved in vector format and dpi is therefore pretty meaningless. In any resolution, if I load up your PDF in a decent viewer (I used inkscape, others are available), you can clearly see the stripes - I actually found it easier to observe if you set every second row to zero. All the PDFs generated contain complete information to reproduce the stripes and are consequently virtually identical. As you specify figsize=(45, 10)
, all the generated PDFs have suggested display size 45 inches x 10 inches.
If I specify png
as the image type, I see a difference in file size based on the dpi
parameter, which I think is what you're expecting. If you look at the 100 dpi image, it has 4500000, the 200 dpi image has 18000000 pixels (4x as many) and the 300 dpi image has 40500000 (9x as many). You will notice that 4500000 == 1500 x 3000 i.e. one pixel per member of your original array. It follows, then, that the larger dpi settings don't gain you any further definition really - instead, your stripes are 2 or 3 pixels wide respectively instead of 1.
I think what you want to do is effectively plot every column 10 times, so you get an image 1500 x 30000 pixels. To do this, using all your own code, you could use np.repeat to do something like the following:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
R, C = 1500, 3000
DATA = np.random.random((R, C))
DATA[::2, :] = 0 # make every other line plain white
Yi, Xi = 1, 10 # increment
DATA = np.repeat(DATA, Xi, axis=1)
DATA = np.repeat(DATA, Yi)
CMP = 'seismic'
ImageFormat ='pdf'
Name = 'Image'
DataRange = (np.absolute(DATA)).max() # I want my data centred on 0
EXTENT = [0, Xi*C, 0 ,Yi*R]
NORM = matplotlib.colors.Normalize(vmin =-DataRange, vmax= DataRange, clip =True)
for i in range(1,4):
Fig=plt.figure(figsize=(45, 10), dpi = 100*i, tight_layout=True)
Fig.suptitle(Name+str(i)+'00DPI')
ax = Fig.add_subplot(1, 1, 1)
Plot = ax.imshow(DATA, cmap=plt.get_cmap(CMP), norm = NORM, extent = EXTENT, aspect = 1, interpolation='none')
ax.set_xlabel('metres')
ax.set_ylabel('metres')
Fig.savefig(Name+str(i)+'00DPI.'+ImageFormat, format = ImageFormat, dpi = Fig.dpi)
plt.close()
Caveat: This a memory intensive solution - there may be better ways out there. If you don't need the vector graphics output of pdf
, you can change your ImageFormat
variable to png
It strikes me that the other thing you might be concerned with is to give the picture the appropriate aspect ratio (i.e 20 times as wide as it is high). This you're already doing. So, if you look at each representation of a pixel in the pdf
, they are rectangular (10 times as wide as they are tall), not square.
Running your example, everything looks good in matplotlib after zooming: no matter the resolution, results are the same and I see one pixel per axis unit. Also, trying with smaller arrays, pdfs (or other formats) work well.
This is my explanation: when you set figure dpi, you are setting the dpi of the entire figure (not only the data area). On my system, this results in the plot area occupying vertically about 20% of the entire figure. If you set 300 dpi and 10 in height, you get for vertical data axis a total of 300x10x0.2=600 pixels, that are not enough to represent 1500 points, this explains to me why output must be resampled. Note that reducing the width sometimes incidentally works because it changes the fraction of figure occupied by the data plot.
Then you have to increase the dpi and also set interpolation='none' (it shouldn't matter if resolution is perfectly set, but it matters if it is just close enough). Also you can adjust the plot position and size to take a larger part of the figure, but going back to the optimal resolution settings, ideally you want to have a number of pixel on the axis that is a multiple of your data points, otherwise some kind of interpolation must happen (think how you can plot two points on three pixels, or viceversa).
I don't know if the following is the best way to do it, there might be more suitable methods and properties in matplotlib, but I would try something like this to calculate the optimal dpi:
vsize=ax.get_position().size[1] #fraction of figure occupied by axes
axesdpi= int((Fig.get_size_inches()[1]*vsize)/R) #(or Yi*R according to what you want to do)
Then your code (reduced to the first loop), becomes:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
R, C = 1500, 3000
DATA = np.random.random((R, C))
DATA[::2, :] *= -1 # make every other line negative
Yi, Xi = 1, 10 # increment
CMP = 'seismic'
ImageFormat ='pdf'
Name = 'Image'
DataRange = (np.absolute(DATA)).max() # I want my data centred on 0
EXTENT = [0, Xi*C, 0 ,Yi*R]
NORM = matplotlib.colors.Normalize(vmin =-DataRange, vmax= DataRange, clip =True)
for i in (1,):
print i
Fig=plt.figure(figsize=(45, 10), dpi = 100*i, tight_layout=True)
Fig.suptitle(Name+str(i)+'00DPI')
ax = Fig.add_subplot(1, 1, 1)
Plot = ax.imshow(DATA, cmap=plt.get_cmap(CMP), norm = NORM, extent = EXTENT, aspect = 1, interpolation='none')
ax.set_xlabel('metres')
ax.set_ylabel('metres')
vsize=ax.get_position().size[1] #fraction of figure occupied by axes
axesdpi= int((Fig.get_size_inches()[1]*vsize)/R) #(or Yi*R according to what you want to do)
Fig.savefig(Name+str(axesdpi)+'DPI.'+ImageFormat, format = ImageFormat, dpi = axesdpi)
#plt.close()
This works reasonably for me.