问题
Using a .NET application, I am trying to create a PDF "table of contents" that references other files, like one would distribute on a DVD etc.
For this purpose, I need a search index and catalog, so full-text search will work across documents. I have been able to automate the construction of the index by copying an "old" .pdx file (the directory structure is always the same) and then calling JavaScript from C#:
var js = $@"catalog.getIndex(""{pdxFilePath}"").build('alert(""Hello"")', true)";
formFields.ExecuteThisJavascript(js);
But how can I associate the .pdx file with my .pdf document, so it gets loaded automatically?
In Acrobat, this is set in the "advanced" document properties:
However, this is not accessible via the info
or metadata
properties of the document.
Apparently this is stored somewhere else, but I don't know enough about the PDF format to figure out how to access this data:
Any help would be highly appreciated. I could use both the Adobe SDK/JavaScript API or some other library (for instance, I know we already have an Aspose license).
回答1:
/Search entry is not documented in PDF specification, probably is it an Adobe extension.
You can use any library that supports low level COS objects (dictionaries, strings, numbers, streams, etc) but since the entry is not documented, you can only infer its structure from sample PDF files.
回答2:
Answering my own question here... I was able to solve this using PdfSharp.
The following code is compatible with PdfSharp 1.50.4845-RC2a.
pdxFile
should be the name of the .pdx file including the file extension (e.g. "catalog.pdx"). I have only tested this with .pdx files located in the same folder as the PDF document, but I would assume that relative paths in general should work.
No guarantees that this is a perfect solution as I lack a deeper understanding of the PDF format, but this seems to work at least.
private void SetSearchCatalog(PdfDocument doc, string pdxFile)
{
var indexDict = new PdfDictionary(doc);
indexDict.Elements["/F"] = new PdfString(pdxFile, PdfStringEncoding.RawEncoding);
indexDict.Elements["/Type"] = new PdfName("/Filespec");
var indexArrayItemDict = new PdfDictionary(doc);
indexArrayItemDict.Elements["/Index"] = indexDict;
indexArrayItemDict.Elements["/Name"] = new PdfName("/PDX");
var indexArray = new PdfArray(doc, indexArrayItemDict);
var searchDict = new PdfDictionary(doc);
searchDict.Elements["/Indexes"] = indexArray;
doc.Internals.Catalog.Elements["/Search"] = searchDict;
}
来源:https://stackoverflow.com/questions/51127552/how-to-associate-search-catalog-file-pdx-with-pdf-document