How to convert PDF to PNG efficiently?

后端 未结 1 1712
有刺的猬
有刺的猬 2020-12-10 19:46

I have the following function to convert a PDF into a series of images (one image per page):

import Quartz

func convertPDF(at sourceURL: URL, to destination         


        
相关标签:
1条回答
  • 2020-12-10 20:16

    After struggling with this for a whole day, I end up answering my own question.

    The solution is to drop lower, into Core Graphics and Image I/O frameworks, to render each PDF page into a bitmap context. This problem lends itself very well to paralellization since each page can be converted into a bitmap on its own thread.

    struct ImageFileType {
        var uti: CFString
        var fileExtention: String
    
        // This list can include anything returned by CGImageDestinationCopyTypeIdentifiers()
        // I'm including only the popular formats here
        static let bmp = ImageFileType(uti: kUTTypeBMP, fileExtention: "bmp")
        static let gif = ImageFileType(uti: kUTTypeGIF, fileExtention: "gif")
        static let jpg = ImageFileType(uti: kUTTypeJPEG, fileExtention: "jpg")
        static let png = ImageFileType(uti: kUTTypePNG, fileExtention: "png")
        static let tiff = ImageFileType(uti: kUTTypeTIFF, fileExtention: "tiff")
    }
    
    func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: ImageFileType, dpi: CGFloat = 200) throws -> [URL] {
        let pdfDocument = CGPDFDocument(sourceURL as CFURL)!
        let colorSpace = CGColorSpaceCreateDeviceRGB()
        let bitmapInfo = CGImageAlphaInfo.noneSkipLast.rawValue
    
        var urls = [URL](repeating: URL(fileURLWithPath : "/"), count: pdfDocument.numberOfPages)
        DispatchQueue.concurrentPerform(iterations: pdfDocument.numberOfPages) { i in
            // Page number starts at 1, not 0
            let pdfPage = pdfDocument.page(at: i + 1)!
    
            let mediaBoxRect = pdfPage.getBoxRect(.mediaBox)
            let scale = dpi / 72.0
            let width = Int(mediaBoxRect.width * scale)
            let height = Int(mediaBoxRect.height * scale)
    
            let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: 8, bytesPerRow: 0, space: colorSpace, bitmapInfo: bitmapInfo)!
            context.interpolationQuality = .high
            context.setFillColor(.white)
            context.fill(CGRect(x: 0, y: 0, width: width, height: height))
            context.scaleBy(x: scale, y: scale)
            context.drawPDFPage(pdfPage)
    
            let image = context.makeImage()!
            let imageName = sourceURL.deletingPathExtension().lastPathComponent
            let imageURL = destinationURL.appendingPathComponent("\(imageName)-Page\(i+1).\(fileType.fileExtention)")
    
            let imageDestination = CGImageDestinationCreateWithURL(imageURL as CFURL, fileType.uti, 1, nil)!
            CGImageDestinationAddImage(imageDestination, image, nil)
            CGImageDestinationFinalize(imageDestination)
    
            urls[i] = imageURL
        }
        return urls
    }
    

    Usage:

    let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
    let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
    let urls = try convertPDF(at: sourceURL, to: destinationURL, fileType: .png, dpi: 200)
    

    Conversion is now blisteringly fast. Memory usage is a lot lower. Obviously the higher DPI you go the more CPU and memory it needs. Not sure about GPU acceleration as I only have a weak Intel integrated GPU.

    0 讨论(0)
提交回复
热议问题