问题
I have the tiny goal to convert a huge load of docx files to pdfs using C# and .NET without opening Word (visibly) and without using any third party library (less components to manage and less money to spend). At the moment, I am trying to correctly convert a single document, which has to be as efficient as possible in order to quickly convert the huge load mentioned before (several thousand docx files, each between 100 and 300 kb).
I am
using Word = Microsoft.Office.Interop.Word;
and I am creating an instance of the Word application as follows
(explicitly without ScreenUpdating = false due to this option causing custom borders not to be converted into the result):
private static Word.Application word = new Word.Application()
{
Visible = false
};
as well as opening the document to be converted the following way
Word.Document docxFile = word.Documents.Open(sourcePath, Visible: false);
There are 3 ways I have successfully used to convert a docx file of 100 kb to a pdf:
Microsoft.Office.Interop.Word.Document.SaveAs2
docxFile.SaveAs2(FileName: outputPath,
FileFormat: Word.WdSaveFormat.wdFormatPDF);
Duration | pdf size
————————————————————————
757,5307 ms | 277 kb
Microsoft.Office.Interop.Word.Document.ExportAsFixedFormat
docxFile.ExportAsFixedFormat(OutputFileName: outputPath,
ExportFormat: Word.WdExportFormat.wdExportFormatPDF,
OptimizeFor: Word.WdExportOptimizeFor.wdExportOptimizeForOnScreen);
Duration | pdf size
————————————————————————
783,51333 ms | 285 kb
Microsoft.Office.Interop.Word.Document.PrintOut
docxFile.Activate();
docxFile.PrintOut(
OutputFileName: outputPath,
PrintToFile: true
);
Duration | pdf size
————————————————————————
998,5403 ms | 290 kb
This last option has two additional downsides:
- it opens a small dialog or popup that shows the printing progress, I guess that causes the slightly extended run time
- the document has to be activated before by
docxFile.Activate()
, otherwise aCOMException
is thrown
Time measurement / estimation:
I simply took a DateTime.Now
before starting the conversion on an already opened document and took another DateTime.Now
after having closed that document. Then I subtracted the first from the second one:
DateTime conversionBegin = DateTime.Now;
// conversion followed by closing the document
...
DateTime conversionEnd = DateTime.Now;
TimeSpan conversionTime = conversionEnd.Subtract(conversionBegin);
Console.WriteLine("Conversion time: " + conversionTime.TotalMilliseconds + " ms");
I am aware of this not being a very reliable time measurement, but I don't really need one. I just wanted to see if there are significant differences.
That all leads to the (one) question...
Why does the content of each of the resulting pdfs look exactly the same but the conversion time is different and the resulting files have different sizes?
Maybe I remove the following text to avoid close votes, but for now:
This is just a request for additional information, the question to be answered is the one above:
I would love to additionally read opinions, hints, recommendations or advices targeting the questions
What is the preferred way to convert a docx to a pdf when it comes to several thousand conversions in one run?
and
What parameters or parameter values of the methods might improve the conversion time?
来源:https://stackoverflow.com/questions/60991409/docx-to-pdf-saveas2-vs-exportasfixedformat-vs-printout