问题
I'm using matplotlib to produce PDF figures. However, even the simplest figures produce relatively large files, the MWE below produces a file of almost 1 MB. I've become aware that the large file size is due to matplotlib fully embedding all the used fonts. Since I'm going to produce quite a few plots and would like to reduce the file sizes, I'm wondering:
Main question:
Is there a way to get matplotlib to embed font subsets instead of the complete fonts? I would also be fine with not including the fonts at all.
Things considered so far:
- A vector graphics editor can readily be used to export a PDF including font subsets (as well as not including fonts at all), but having to perform this step for every file (revision) appears unnecessarily tedious.
- Similarly, I've read about post-processing PDF-files (e.g. using Ghostscript), though the effort seems comparable.
- I tried setting 'pdf.fonttype'= 3, which does indeed produces considerably smaller files. However, I'd like to keep the text modifiable in vector graphics editors - which doesn't seem to work in this case (for example minus-signs will not be saved as text).
Since it is easy, though labor-intensive, to produce files with embedded subsets using external software, is it somehow possible to achieve this directly in matplotlib? Any help would be greatly appreciated.
MWE
import matplotlib.pyplot as plt #Setup
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['mathtext.fontset'] = 'dejavuserif'
mpl.rc('font',family='Arial',size=12)
fig,ax=plt.subplots(figsize=(2,2)) #Create a figure containing some text
ax.semilogy(1,1,'s',label='Text\n$M_\mathrm{ath}$')
ax.legend()
fig.tight_layout()
fig.savefig('test.pdf')
Environment: matplotlib 3.1.1
回答1:
The PGF backend helps to reduce a PDF file size dramatically. Just add mpl.use('pgf')
to your code. In my environment, this amendment leads to the following:
- File size decreases from 817K to 21K (40 times smaller!).
- Execution time increases from 1s to 3s.
However, for real figures, the execution time often decreases along with the file size.
The reduction in PDF size is attributed to embedding subsets of fonts.
$ pdffonts pdf_backend.pdf
name type emb sub uni prob object ID
---------------------------- ----------------- --- --- --- ---- ---------
ArialMT CID TrueType yes no yes 14 0
DejaVuSerif-Italic CID TrueType yes no yes 23 0
DejaVuSerif CID TrueType yes no yes 32 0
$ pdffonts pgf_backend.pdf
name type emb sub uni prob object ID
---------------------------- ----------------- --- --- --- ---- ---------
KECVVY+ArialMT CID TrueType yes yes yes 7 0
EFAAMX+CMR12 Type 1C yes yes yes 8 0
EHYQVR+CMSY8 Type 1C yes yes yes 9 0
UVNOSL+CMR8 Type 1C yes yes yes 10 0
FDPQQI+CMMI12 Type 1C yes yes yes 11 0
DGIYWD+DejaVuSerif CID TrueType yes yes yes 13 0
Another option is to produce an EPS file (using the PostScript backend) and convert it to the PDF format, e.g., by epstopdf (using the GhostScript interpreter). This way reduces the PDF file to 9K. However, it is worth noting that the PS backend does not support transparency.
回答2:
Leaving this here in case anybody else might be looking for something similar: After all, I decided to opt for Ghostscript. Due to the extra step it is not exactly what I was looking for, but at least it can be automated:
import subprocess
def gs_opt(filename):
filenameTmp = filename.split('.')[-2]+'_tmp.pdf'
gs = ['gswin64',
'-sDEVICE=pdfwrite',
'-dEmbedAllFonts=false',
'-dSubsetFonts=true', # Create font subsets (default)
'-dPDFSETTINGS=/prepress', # Image resolution
'-dDetectDuplicateImages=true', # Embeds images used multiple times only once
'-dCompressFonts=true', # Compress fonts in the output (default)
'-dNOPAUSE', # No pause after each image
'-dQUIET', # Suppress output
'-dBATCH', # Automatically exit
'-sOutputFile='+filenameTmp, # Save to temporary output
filename] # Input file
subprocess.run(gs) # Create temporary file
subprocess.run(['del', filename],shell=True) # Delete input file
subprocess.run(['ren',filenameTmp,filename],shell=True) # Rename temporary to input file
And then calling
filename = 'test.pdf'
plt.savefig(filename)
gs_opt(filename)
This will save the figure as test.pdf, use Ghostscript to create a temporary, optimized test_tmp.pdf, delete the initial file and rename the optimized file to test.pdf.
Compared to exporting the file with a vector graphics editor, the resulting PDF created by Ghostscript is still a few times larger (typically 4-5 times). However, it is decreasing the file size to something between 1/5 and 1/10 of the initial file. It’s something.
来源:https://stackoverflow.com/questions/60076026/reducing-file-sizes-of-pdfs-created-using-matplotlib-by-changing-font-embedding