PDF form field in a text editor

半城伤御伤魂 提交于 2019-12-11 20:35:06

问题


To make the long story short; I would like to edit a read-only field from a pdf form using a text editor ONLY. I've succeeded but I would like to understand why in some cases it doesn't work...

I've noticed that if I have a version PDF 1.5 of my original document (without fields, saved by word 2010 as pdf) and add the field with Acrobat Pro XI, save it using Save as other... -> Optimized PDF and make it compatible with Acrobat 6.0. My field will look like this in a text editor (notepad++):

<</AP<</N 28 0 R>>/DA(/Helv 12 Tf 0 g)/DV(mytextfield)/F 4/FT/Tx/Ff 1/MK<<>>/P 3 0 

R/Rect[99.4934 686.99 249.493 708.99]/Subtype/Widget/T(%mytextfield)/Type/Annot/V(mytextfield)>>
endobj
28 0 obj
<</BBox[0.0 0.0 150.0 22.0]/FormType 1/Length 88/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 20 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
/Tx BMC 
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
0 g
2 6.548 Td
(mytextfield) Tj

Which is very easy to modify as every time you see 'mytextfield', it's the content of my field and where you see '%mytextfield', it's the name of my field.

On the other hand, if I take my PDF 1.5 (saved by word 2010) and instead of making an optimized saving (after adding the field) using acrobat pro XI I save it normally (save as), I get a PDF 1.6 with the following (in notepad++):

<</AcroForm 25 0 R/Lang(fr-CH)/MarkInfo<</Marked true>>/Metadata 3 0 R/Pages 15 0 R/StructTreeRoot 8 0 R/Type/Catalog>>
endobj
19 0 obj
<</Annots 26 0 R/Contents 22 0 R/CropBox[0 0 595.32 841.92]/Group<</CS/DeviceRGB/S/Transparency/Type/Group>>/MediaBox[0 0 595.32 841.92]/Parent 15 0 R/Resources<</ExtGState<</GS0 30 0 R>>/Font<</TT0 33 0 R>>/ProcSet[/PDF/Text]>>/Rotate 0/StructParents 0/Tabs/S/Type/Page>>
endobj
20 0 obj
<</BBox[0.0 0.0 150.0 22.0]/FormType 1/Length 85/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 28 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
/Tx BMC 
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
0 g
2 6.548 Td
(mytextfield) Tj

Which is not an easy format to edit the field (if I change mytextfield, I get a corrupted document!). Now, it would be just fine if when I open this PDF 1.6 in acrobat pro and save it using the optimized PDF trick mentioned above the field would transform to the first one; but it's not the case! Instead I get the exact same field format.

So my questions are the following:

  1. Is there a way to ensure that my pdf form, no matter which PDF version the original is, get converted to the right format (field easy to edit) using Acrobat Pro or any other program?
  2. Is there a way to easily edit the PDF 1.6 fields?

回答1:


The OP in comments made clear that during his edits he replaced PDF data by something longer or shorter.

This in general is a bad idea because PDF files have a cross reference table (or stream) indicating the respective offset of each indirect object (each nnn 0 obj...endobj). Replacing PDF data with data of different length invalidates these cross reference information for objects following the editing positions.

Thus, to have a valid PDF after editing, one at least has to update cross reference information which in a mere text editor is a real hassle (in case of cross reference tables) or even virtually impossible (in case of compressed cross reference streams).

Details can be found in the PDF specification ISO 32000-1.

Furthermore the OP indicated that he checked for document validity after his edits by opening them in a PDF viewer.

This also is not a good idea because well-known PDF viewers generally have the tendency to try and repair invalid PDFs on the fly without necessarily showing this. Programs manipulating PDFs more often require valid PDFs (at least valid in the aspect they are manipulating) as input and, therefore, probably will reject or (even worse) garble the edited PDFs.

The OP indicates his task has been described in this question. Unless there is some appropriate JS library out there, he will essentially have to program one according to his needs.

It might be advantageous to try and use incremental updates here instead of manipulating the inner information of the source PDF. For this look at section 7.5.6 Incremental Updates in the specification mentioned above.

PS The OP asked

would incremental updates work with read-only fields

Incremental updates simply are a different way to organize your changes - everything you can change inside the original file you can also change using incremental updates. Actually you can even do more using incremental updates: In case of signed documents often certain changes to the document still are allowed, but these changes must be made as incremental updates as otherwise the signature would be structurally broken.



来源:https://stackoverflow.com/questions/24986609/pdf-form-field-in-a-text-editor

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!