pypdf

Extract images from PDF without resampling, in python?

亡梦爱人 提交于 2019-11-26 02:50:34
问题 How might one extract all images from a pdf document, at native resolution and format? (Meaning extract tiff as tiff, jpeg as jpeg, etc. and without resampling). Layout is unimportant, I don\'t care were the source image is located on the page. I\'m using python 2.7 but can use 3.x if required. 回答1: Often in a PDF, the image is simply stored as-is. For example, a PDF with a jpg inserted will have a range of bytes somewhere in the middle that when extracted is a valid jpg file. You can use