Extract images with podofo from pdf pages

做~自己de王妃 提交于 2019-12-08 10:49:03

问题


I need to extract all images from a pdf file with podofo. Extracting all images from the file works well. I used the image extractor example for that. This receives all objects and iterates over them. But I need to iterate over pages and check for image objects on a page. Does anyone know how to do that?


回答1:


Piggy backing off podofoimgextract, you could iterate each page, get the page resource object, check for an XObject or Image, and from here it's pretty much the exact same code that is used in the image extract utility.

for (int pageN = 0; pageN < document.GetPageCount(); pageN++) {
  PdfPage* page = document.GetPage(pageN);
  PdfDictionary resource = page->GetResources()->GetDictionary();

  for (auto& k : resource.GetKeys()) {
    if (k.first.GetName() == "XObject" || k.first.GetName() == "Image") {
      if (k.second->IsDictionary()) {
        auto targetDict = k.second->GetDictionary();
        for (auto& r : k.second->GetDictionary().GetKeys()) {
          // The XObject will usually contain indirect objects as it's values.
          // Check for a reference
          if (r.second->IsReference()) {
            // Get the object that is being referenced.
            auto target =
              document.GetObjects().GetObject(r.second->GetReference());
            if (target->IsDictionary()) {
              auto targetDict = target->GetDictionary();
              auto kf = targetDict.GetKey(PdfName::KeyFilter);
              if (!kf)
                continue;
              if (kf->IsArray() && kf->GetArray().GetSize() == 1 &&
                  kf->GetArray()[0].IsName() &&
                  kf->GetArray()[0].GetName().GetName() == "DCTDecode") {
                kf = &kf->GetArray()[0];
              }
              if (kf->IsName() && kf->GetName().GetName() == "DCTDecode") {
                ExtractImage(target, true);
              } else {
                ExtractImage(target, false);
              }
            }
          }
        }
      }
    }
  }
}


来源:https://stackoverflow.com/questions/43128330/extract-images-with-podofo-from-pdf-pages

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!