How to edit docx with nokogiri and rubyzip

后端 未结 3 1049
走了就别回头了
走了就别回头了 2021-02-01 10:27

I\'m using a combination of rubyzip and nokogiri to edit a .docx file. I\'m using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the wo

相关标签:
3条回答
  • 2021-02-01 11:11

    According to the official Github documentation, you should Use write_buffer instead open. There's also a code example at the link.

    0 讨论(0)
  • 2021-02-01 11:12

    I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.

    Here's my working proof of concept:

    #!/usr/bin/env ruby
    
    require 'rubygems'
    require 'zip/zip' # rubyzip gem
    require 'nokogiri'
    
    class WordXmlFile
      def self.open(path, &block)
        self.new(path, &block)
      end
    
      def initialize(path, &block)
        @replace = {}
        if block_given?
          @zip = Zip::ZipFile.open(path)
          yield(self)
          @zip.close
        else
          @zip = Zip::ZipFile.open(path)
        end
      end
    
      def merge(rec)
        xml = @zip.read("word/document.xml")
        doc = Nokogiri::XML(xml) {|x| x.noent}
        (doc/"//w:fldSimple").each do |field|
          if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
            text_node = (field/".//w:t").first
            if text_node
              text_node.inner_html = rec[$1].to_s
            else
              puts "No text node for #{$1}"
            end
          end
        end
        @replace["word/document.xml"] = doc.serialize :save_with => 0
      end
    
      def save(path)
        Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
          @zip.each do |entry|
            out.get_output_stream(entry.name) do |o|
              if @replace[entry.name]
                o.write(@replace[entry.name])
              else
                o.write(@zip.read(entry.name))
              end
            end
          end
        end
        @zip.close
      end
    end
    
    if __FILE__ == $0
      file = ARGV[0]
      out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
      w = WordXmlFile.open(file) 
      w.force_settings
      w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
      w.save(out_file)
    end
    
    0 讨论(0)
  • 2021-02-01 11:25

    I stumbled accross the post and know nothing about ruby or nokogiri but ...

    It looks like you are reziping the new content incorrectly. I don't know about rubyzip, but you need a way to tell it to update the entry word/document.xml and then resave/rezip the file.

    It looks like you are just overwriting the entry with new data wich of course is going to be a different size and totally screw up the rest of the zip file.

    I give an example for excel in this post Parse text file and create an excel report

    which may be of use even though i am using a different zip library and VB (Im still doing exactly what you are trying to do, my code is about half way down)

    here is the part that applies

    Using z As ZipFile = ZipFile.Read(xlStream.BaseStream) 
    'Grab Sheet 1 out of the file parts and read it into a string. 
    Dim myEntry As ZipEntry = z("xl/worksheets/sheet1.xml") 
    Dim msSheet1 As New MemoryStream 
    myEntry.Extract(msSheet1) 
    msSheet1.Position = 0 
    Dim sr As New StreamReader(msSheet1) 
    Dim strXMLData As String = sr.ReadToEnd 
    
    'Grab the data in the empty sheet and swap out the data that I want  
    Dim str2 As XElement = CreateSheetData(tbl) 
    Dim strReplace As String = strXMLData.Replace("<sheetData/>", str2.ToString) 
    z.UpdateEntry("xl/worksheets/sheet1.xml", strReplace) 
    'This just rezips the file with the new data it doesnt save to disk 
    z.Save(fiRet.FullName) 
    End Using 
    
    0 讨论(0)
提交回复
热议问题