I have been searching a lot on Google about how to compress existing pdf
(size).
My problem is
I can\'t use any application, because it needs to b
Using PdfSharp
public static void CompressPdf(string targetPath)
{
using (var stream = new MemoryStream(File.ReadAllBytes(targetPath)) {Position = 0})
using (var source = PdfReader.Open(stream, PdfDocumentOpenMode.Import))
using (var document = new PdfDocument())
{
var options = document.Options;
options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;
options.UseFlateDecoderForJpegImages = PdfUseFlateDecoderForJpegImages.Automatic;
options.CompressContentStreams = true;
options.NoCompression = false;
foreach (var page in source.Pages)
{
document.AddPage(page);
}
document.Save(targetPath);
}
}
GhostScript is AGPL licensed software that can compress PDFs. There is also an AGPL licensed C# wrapper for it on github here.
You could use the GhostscriptProcessor
class from that wrapper to pass custom commands to GhostScript, like the ones found in this AskUbuntu answer describing PDF compression.
Here's an approach to do this (and this should work without regard to the toolkit you use):
If you have a 24-bit rgb or 32 bit cmyk image do the following:
That said, if you do can do all of this well in an unsupervised manner, you have a commercial product in its own right.
I will say that you can do most of this with Atalasoft dotImage (disclaimers: it's not free; I work there; I've written nearly all the PDF tools; I used to work on Acrobat).
One particular way to that with dotImage is to pull out all the pages that are image only, recompress them and save them out to a new PDF then build a new PDF by taking all the pages from the original document and replacing them the recompressed pages, then saving again. It's not that hard.
List<int> pagesToReplace = new List<int>();
PdfImageCollection pagesToEncode = new PdfImageCollection();
using (Document doc = new Document(sourceStream, password)) {
for (int i=0; i < doc.Pages.Count; i++) {
Page page = doc.Pages[i];
if (page.SingleImageOnly) {
pagesToReplace.Add(i);
// a PDF image encapsulates an image an compression parameters
PdfImage image = ProcessImage(sourceStream, doc, page, i);
pagesToEncode.Add(i);
}
}
PdfEncoder encoder = new PdfEncoder();
encoder.Save(tempOutStream, pagesToEncode, null); // re-encoded pages
tempOutStream.Seek(0, SeekOrigin.Begin);
sourceStream.Seek(0, SeekOrigin.Begin);
PdfDocument finalDoc = new PdfDocument(sourceStream, password);
PdfDocument replacementPages = new PdfDocument(tempOutStream);
for (int i=0; i < pagesToReplace.Count; i++) {
finalDoc.Pages[pagesToReplace[i]] = replacementPages.Pages[i];
}
finalDoc.Save(finalOutputStream);
What's missing here is ProcessImage(). ProcessImage will rasterize the page (and you wouldn't need to understand that the image might have been scaled to be on the PDF) or extract the image (and track the transformation matrix on the image), and go through the steps listed above. This is non-trivial, but it's doable.
I think you might want to make your clients aware that any of the libraries you mentioned is not completely free:
Given all of the above I assume I can drop freeware requirement.
Docotic.Pdf can reduce size of compressed and uncompressed PDFs to different degrees without introducing any destructive changes.
Gains depend on the size and structure of a PDF: For small files or files that are mostly scanned images the reduction might not be that great, so you should try the library with your files and see for yourself.
If you are most concerned about size and there are many images in your files and you are fine with loosing some of the quality of those images then you can easily recompress existing images using Docotic.Pdf.
Here is the code that makes all images bilevel and compressed with fax compression:
static void RecompressExistingImages(string fileName, string outputName)
{
using (PdfDocument doc = new PdfDocument(fileName))
{
foreach (PdfImage image in doc.Images)
image.RecompressWithGroup4Fax();
doc.Save(outputName);
}
}
There are also RecompressWithFlate
, RecompressWithGroup3Fax
and RecompressWithJpeg
methods.
The library will convert color images to bilevel ones if needed. You can specify deflate compression level, JPEG quality etc.
Docotic.Pdf can also resize big images (and recompress them at the same time) in PDF. This might be useful if images in a document are actually bigger then needed or if quality of images is not that important.
Below is a code that scales all images that have width or height greater or equal to 256. Scaled images are then encoded using JPEG compression.
public static void RecompressToJpeg(string path, string outputPath)
{
using (PdfDocument doc = new PdfDocument(path))
{
foreach (PdfImage image in doc.Images)
{
// image that is used as mask or image with attached mask are
// not good candidates for recompression
if (!image.IsMask && image.Mask == null && (image.Width >= 256 || image.Height >= 256))
image.Scale(0.5, PdfImageCompression.Jpeg, 65);
}
doc.Save(outputPath);
}
}
Images can be resized to specified width and height using one of the ResizeTo
methods. Please note that ResizeTo
method won't try to preserve aspect ratio of images. You should calculate proper width and height yourself.
Disclaimer: I work for Bit Miracle.