How to convert PDF files to images

前端 未结 12 495
庸人自扰
庸人自扰 2020-11-27 14:46

I need to convert PDF files to images. If the PDF file is multi-page,I just need one image that contains all of the PDF pages.

Is there an open sou

相关标签:
12条回答
  • 2020-11-27 15:14

    Apache PDFBox also works great for me.

    Usage with the command line tool:

    javar -jar pdfbox-app-2.0.19.jar PDFToImage -quality 1.0  -dpi 150 -prefix out_dir/page -format png
    
    0 讨论(0)
  • 2020-11-27 15:15

    The PDF engine used in Google Chrome, called PDFium, is open source under the "BSD 3-clause" license. I believe this allows redistribution when used in a commercial product.

    There is a .NET wrapper for it called PdfiumViewer (NuGet) which works well to the extent I have tried it. It is under the Apache license which also allows redistribution.

    (Note that this is NOT the same 'wrapper' as https://pdfium.patagames.com/ which requires a commercial license*)

    (There is one other PDFium .NET wrapper, PDFiumSharp, but I have not evaluated it.)

    In my opinion, so far, this may be the best choice of open-source (free as in beer) PDF libraries to do the job which do not put restrictions on the closed-source / commercial nature of the software utilizing them. I don't think anything else in the answers here satisfy that criteria, to the best of my knowledge.

    0 讨论(0)
  • 2020-11-27 15:20

    You can use Ghostscript to convert PDF to images.

    To use Ghostscript from .NET you can take a look at Ghostscript.NET library (managed wrapper around the Ghostscript library).

    To produce image from the PDF by using Ghostscript.NET, take a look at RasterizerSample.

    To combine multiple images into the single image, check out this sample: http://www.niteshluharuka.com/2012/08/combine-several-images-to-form-a-single-image-using-c/#

    0 讨论(0)
  • 2020-11-27 15:21

    I kind of bumped into this project at SourceForge. It seems to me it's still active.

    1. PDF convert to JPEG at SourceForge
    2. Developer's site

    My two cents.

    0 讨论(0)
  • 2020-11-27 15:28

    The thread "converting PDF file to a JPEG image" is suitable for your request.

    One solution is to use a third-party library. ImageMagick is a very popular and is freely available too. You can get a .NET wrapper for it here. The original ImageMagick download page is here.

    • Convert PDF pages to image files using the Solid Framework Convert PDF pages to image files using the Solid Framework (dead link, the deleted document is available on Internet Archive).
    • Convert PDF to JPG Universal Document Converter
    • 6 Ways to Convert a PDF to a JPG Image

    And you also can take a look at the thread "How to open a page from a pdf file in pictureBox in C#".

    If you use this process to convert a PDF to tiff, you can use this class to retrieve the bitmap from TIFF.

    public class TiffImage
    {
        private string myPath;
        private Guid myGuid;
        private FrameDimension myDimension;
        public ArrayList myImages = new ArrayList();
        private int myPageCount;
        private Bitmap myBMP;
    
        public TiffImage(string path)
        {
            MemoryStream ms;
            Image myImage;
    
            myPath = path;
            FileStream fs = new FileStream(myPath, FileMode.Open);
            myImage = Image.FromStream(fs);
            myGuid = myImage.FrameDimensionsList[0];
            myDimension = new FrameDimension(myGuid);
            myPageCount = myImage.GetFrameCount(myDimension);
            for (int i = 0; i < myPageCount; i++)
            {
                ms = new MemoryStream();
                myImage.SelectActiveFrame(myDimension, i);
                myImage.Save(ms, ImageFormat.Bmp);
                myBMP = new Bitmap(ms);
                myImages.Add(myBMP);
                ms.Close();
            }
            fs.Close();
        }
    }
    

    Use it like so:

    private void button1_Click(object sender, EventArgs e)
    {
        TiffImage myTiff = new TiffImage("D:\\Some.tif");
        //imageBox is a PictureBox control, and the [] operators pass back
        //the Bitmap stored at that position in the myImages ArrayList in the TiffImage
        this.pictureBox1.Image = (Bitmap)myTiff.myImages[0];
        this.pictureBox2.Image = (Bitmap)myTiff.myImages[1];
        this.pictureBox3.Image = (Bitmap)myTiff.myImages[2];
    }
    
    0 讨论(0)
  • 2020-11-27 15:29

    There is a free nuget package (Pdf2Image), which allows the extraction of pdf pages to jpg files or to a collection of images (List ) in just one line

            string file = "c:\\tmp\\test.pdf";
    
            List<System.Drawing.Image> images = PdfSplitter.GetImages(file, PdfSplitter.Scale.High);
    
            PdfSplitter.WriteImages(file, "c:\\tmp", PdfSplitter.Scale.High, PdfSplitter.CompressionLevel.Medium);
    

    All source is also available on github Pdf2Image

    0 讨论(0)
提交回复
热议问题