Data extraction from /Filter /FlateDecode PDF stream in PHP

后端 未结 4 1158
离开以前
离开以前 2020-12-22 20:17

I can not decrypt the data from the stream like:

    56 0 obj 
    << /Length 1242 /Filter /FlateDecode >>
    stream
    x]êΩnƒ Ñ{ûbÀKq¬æ\\âê¢..         


        
相关标签:
4条回答
  • 2020-12-22 20:42

    Since you didn't tell if you need to access one decompressed stream only or if you need all streams decompressed, I'll suggest you a simple commandline tool which does it in one go for the complete PDF: Jay Berkenbilt's qpdf.

    Example commandline:

     qpdf --qdf --object-streams=disable in.pdf out.pdf
    

    out.pdf can then be inspected in a text editor (only embedded ICC profiles, images and fonts could still be binary).

    qpdf will also automatically re-order the objects and display the PDF syntax in a normalized way (and telling you in a comment what the original object ID of the de-compressed object was).

    Should you require to re-compress the file again (maybe after you edited it), just run this command:

     qpdf out-edited.pdf out-recompressed.pdf
    

    (You may see some warning message, telling that the utility was attempting to repair a damaged file....)

    qpdf is multi-platform and available from Sourceforge.

    0 讨论(0)
  • 2020-12-22 20:45

    i just used

    import de.intarsys.pdf.filter.FlateFilter;
    

    from jpod / source forge and it works well

    FlateFilter filter = new FlateFilter(null);
    byte[] decoded = filter.decode(bytes, start, end - start);
    

    the bytes are straight from the pdf file

    0 讨论(0)
  • 2020-12-22 20:53

    Long overdue, but someone might find it helpful. In this case: << /Length 1242 /Filter /FlateDecode >> all you need is to pass the isolated binary string (so basically everything between "stream" and "endstream") to zlib.decompress:

    import zlib
    stream = b"êΩnƒ Ñ{ûbÀKq¬æ\âê"  # binary stream here
    data = zlib.decompress(stream) # Here you have your clean decompressed stream
    

    However, if you have/DecodeParms in your PDF object thing become complicated. You will need the /Predictor value and columns number. Better use PyPDF2 for this.

    0 讨论(0)
  • 2020-12-22 21:01
    header('Content-Type: text');           // I going to download the result of decoding
    $n = "binary_file.bin";                 // decoded part in file in a directory
    $f = @fopen($n, "rb");                  // now file is mine
    $c = fread($f, filesize($n));           // now I know all about it 
    $u = @gzuncompress($c);                 // function, exactly fits for this /FlateDecode filter
    $out = fopen("php://output", "wb");     // ready to output anywhere
    fwrite($out, $u);                       // output to downloadable file
    

    Jingle bells! Jingle bells!...

    gzuncompress() - the solution

    0 讨论(0)
提交回复
热议问题