Edit existing PDF in a browser

前端未结

关注

 4  1218

I have a web application that is currently getting a base64 representation of a PDF from the server. I\'m able to use Mozilla\'s pdf.js to display this on a


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2021-01-30 14:17
              
            
            
                                                                       
Because other SO questions are being directed here, and considering how fast web technology advances (e.g. WASM), I am providing the following answer. Though PDFNetJS was able to do all this when the question was originally asked.

Since the requirement of "edit" was clarified to be "Basically what is needed is for users to open up a previously uploaded PDF, highlight or circle sections, and then save those annotations to the PDF back on the server." and "No text editing or manipulation of the document contant needs to happen.", then yes this is possible completely in any modern browser on any modern device.

PDFTron PDFNet SDK can do all this. A full fledged, out of the box document viewer is provided, with full annotation support. It is also possible to actually edit the PDF (change/replace text, redact, extract/add/replace images, and more). Not only are PDF files supported directly client side, but so are DOCX, PPTX, XLSX, PNG and JPG. Files can be loaded locally or remotely, and there is no need for slow base64 encoding/decoding.

Demo: http://www.pdftron.com/webviewer

Samples: http://www.pdftron.com/documentation/web/samples/universal-samples

The original question was also for support for Siebel and "PDFNetJS tries to retrieve a .mem file, which is some binary data. This cannot be served by the application I'm using (Siebel) so it doesn't look like this is an option.".

The .mem file is for PNaCl which is Chrome only, and this can be disabled. PDFTron for Web supports WASM and even emscripten, one of which, if not both, should then be compatible with Siebel.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2021-01-30 14:23
              
            
            
                                                                       
Quick answer - no and it is quite unlikely you will find a cross-browser solution. It is very unlikely that you will find a PDF-perfect solution. Better to think about having the users edit HTML and generate the PDF at the server.

[Edit June 3rd 2020- given this question is from 2017 you may think it is outdated and discount it. Well, as far as I am aware the answer is still relevant and every other week someone passes through and gives it an up-vote. But if you do find a good lib or util on your travels please come back and list it. Thanks.]

Long answer - the PDF format is both brilliant and fiendish at the same time. Brilliant because of its portability, but fiendish because of the internal structure and storage mechanisms. There is no friendly 'DOM' like with HTML. If we were starting out afresh to develop a portable document format it would not be PDF that we would choose. But PDF currently has too much momentum to be thrown away, period. 

Younger viewers might be wondering how the hell this manic format got into its market leading position and where it came from. Well, when the founding fathers of PDF were laying down the design, before XML, JSON, HTML and even the Internet, they weren't working with today's document sharing in mind.  They were working on a better way to encode printing instructions - the PostScript printer driver concept. These were never expected to be edited before the printer consumed them, and they were worthless for any other purpose. Then someone noticed the you could interpret the PostScript drawing instructions to a screen, and subsequently someone spotted the fantastic potential to employ this as a transportable, cross device display concept. And here we are.

Back to the question - to edit a PDF in any meaningful GUI way, you would need to unpack the PDF and render the components (images, formatted text, pages) to the display device; then allow folks to mess with the layout; then re-pack the PDF. You would have to do this perfectly in line with the PDF standards otherwise you may find the downstream consumers of your edited PDF file crash or are unable to render it.  You would have to cater for the various Acrobat standard levels, and the shortcuts and bloats that the editing package (Word, Illustrator, InDesign) vendors chuck into the PDF file; layers, thumbnails, etc.

Then we come to colors. Have a read of the PDF spec and you will see that there are an array of colorspace options that the original PDF producer can decide to use. You would have to interpret these to a reasonable device color on the screen and back, etc.

And then fonts. Fonts might be embedded subset, or not. To keep fidelity with the PDF you will need to realise the glyphs as vector graphics on your drawing surface at the scale defined in the PDF. This mostly means utilising some kind of platform-dependant type library - tricky cross-platform. Plus the fact that you will need to licence the fonts for appropriate use which can be pricey for the fonts most people want to use to look hip and professional.

Given the layering, scaling and rotating features in PDF, you would likely be looking at an html canvas as the drawing surface. Anyone who knows will tell you that in the world of canvas you are pretty much on your own for word-processing type functions.  

Not impossible but hard.

Components that render PDF to a display are largely acting as print drivers, slavishly obeying the PDF drawing instructions, and usually generating a raster or sometimes an SVG graphic. This is a one-way street - they read and draw, but there is no sense of 'handles' to the objects drawn. No handles means no manipulation, and these guys certainly have little intention of letting you modify and write back.

You will find many 'save to pdf' products. When client-side they will be leaning toward grabbing a set of pixels and dumping a raster graphic into a file with the thinnest veneer of 'PDF' definition wrapped around it. Where they are server based then they can be quite powerful - there are plenty of tools like Aspose, and ABCPDF that truly offer some PDF wrangling server side - but this is not what you are looking for in your OP. 

Summary - very complicated subject. If anything ever emerges as a potential it will likely have many constraints in terms of the PDF features covered and thus restrictions on what it can safely edit.

If you are looking for online editing of documents that are ultimately exported as PDF, then a way forward is to keep an html version of the document source and have the user edit this with TinyMCE, CKEditor, etc, then use one of the server-side tools to take the saved source HTML and render out to PDF. Tools like ABCPDF render HTML faithfully let you add images, headers and footers, page numbers, etc.

This is a pragmatic answer to your (assumed) need, though it still has some trade-offs in terms of the font (licencing) issues, clunkiness of browser-based editors, all-round weirdness of the HTML laid down by some HTML editing components, etc. But it IS viable.

Final thoughts - rethink the scope of what you need. If HTML editing and convert to PDF at server is usable for you it is a well-trodden path and you will find both free and commercial components for client and server to support it. 

Edit: If you need to annotate the PDF then things are much easier. On the server, you need to generate images of the pages of the document, send those to the client, display them to the user, let the user mark them up, capture the co-ordinates of the annotations back to the server and use a server-side PDF library to render the annotations into the PDF. It is achievable, though requires various skillsets for server-side PDF to image manipulation and client side presentation and annotation capture.

Edit: Readers may be interested in knowing if the picture I painted above has changed. As of Jan 2019 I stand by what I wrote. Suppliers are coming to the market with better tools and libraries that can do more than previously. However you still need to assess your needs and confirm their restrictions - it is likely that there will be some. No vendor I am aware of yet has a client-side, cross-browser, cross-device, full capability PDF editing lib for any PDF file - there is always some limitation. But I am happy to be corrected.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2021-01-30 14:33
              
            
            
                                                                       
For future reference: 

I found two libraries, that enable you to edit existing PDFs in the browser to certain extends. The second one isn't documented yet, so I don't know exactly what it does. It might be the solution for such a problem in the future.


PDF Assembler
pdf-lib

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2021-01-30 14:35
              
            
            
                                                                       
Community:


pdf-annotate (unmaintained) 


Commercial:


metapdf (abandoned)
pdftron
PSPDFkit 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复

Edit *existing* PDF in a browser

Edit existing PDF in a browser