Can pdfbox extract vector images?

╄→尐↘猪︶ㄣ 提交于 2019-12-10 11:51:59

问题


As per my understanding,

1. .eps format images are vector images.
2. When we draw something in word (like a flowchart) that is stored 
as a vector image.  

I am almost sure about the first, not sure about the second. Please correct me if I am wrong.

Assuming this two things, when a latex file (where .eps images are inserted) or a word file (that contains vector images) is converted into pdf, do the images get converted into raster images?

Also, I think PDFBox/xpdf can only extract raster images from the pdf (as they are embedded as XObjects), not vector images. Is that understanding correct? This question in stackoverflow is related, but have not been answered yet.


回答1:


Your point 1 is incorrect, eps files are PostScript programs, they may contain vector information, or text or image data, or all of the above.

point 2 In PDF there isn't a 'vector image', an image means a bitmap and therefore cannot be vector.

If you convert a PostScript program to a PDF file, then the result depends entirely on the conversion program you use. In general vectors will be retained as vectors, and text as text. However it is entirely possible that an application might render the entire PostScript program and insert the result as an image in the PDF.

So the answer to your first question ("do the images get converted into raster images") is 'maybe, but probably not'.

I'm afraid I have no idea about the capabilities of PDFBox/xpdf, but since collections of vectors may not be arranged as 'images' (they could be held as Form XObjects, or Patterns) in any atomic fashion, there isn't any obvious way to know when to stop extracting. And what format would you store the result in anyway ?



来源:https://stackoverflow.com/questions/14846560/can-pdfbox-extract-vector-images

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!