My app on iOS extracts embedded files from PDF. Now, i try to make Android app with same functionality using MuPDF.
On iOS, I can use Quartz2d to extract embedded files:
- Access to root PDF dictionary (
CGPDFDocumentGetCatalog
) - Get files array (Names > EmbeddedFiles > Names) and itarate through it
- Copy file stream contents from file dictionary (EF > F) to
NSData
and save it.
Is there any way to do this with MuPDF?
Solution, based on pdfextact.c seems like bruteforce, but it works:
- itarate through all pdf objects (
pdf_load_object
) - determine if object is embedded file (
isembed
) - if it is - access it's stream and save file (
saveembed
)
Embedded files stored at the end of file in most test cases, so, reverce iteration makes sence.
static int isembed(pdf_obj *obj) {
pdf_obj *type = pdf_dict_gets(obj, "Type");
return pdf_is_name(type) && !strcmp(pdf_to_name(type), "Filespec");
}
static void saveembed(pdf_obj *dict) {
char *filename;
pdf_obj *obj = pdf_dict_gets(dict, "F");
if (obj) filename = pdf_to_str_buf(obj);
obj = pdf_dict_gets(dict, "EF");
if (!obj) return;
pdf_obj *stream = pdf_dict_gets(obj, "F");
if (!stream) return;
FILE *f;
fz_buffer *buf;
int n, len;
unsigned char *data;
buf = pdf_load_stream(doc, pdf_to_num(stream), pdf_to_gen(stream));
printf("extracting embedded file %s\n", filename);
f = fopen(filename, "wb");
len = fz_buffer_storage(ctx, buf, &data);
n = fwrite(data, 1, len, f);
fclose(f);
fz_drop_buffer(ctx, buf);
}
来源:https://stackoverflow.com/questions/14503948/how-to-extract-embedded-files-from-pdf-using-mupdf