Search and replace placeholder text in PDF with Python

前端 未结 3 1928
情歌与酒
情歌与酒 2021-02-14 04:52

I need to generate a customized PDF copy of a template document. The easiest way - I thought - was to create a source PDF that has some placeholder text where customization need

3条回答
  •  一向
    一向 (楼主)
    2021-02-14 05:44

    There is no definite solution but I found 2 solutions that works most of the time.

    In python https://github.com/JoshData/pdf-redactor gives good results. Here is the example code:

    # Redact things that look like social security numbers, replacing the
    # text with X's.
    options.content_filters = [
            # First convert all dash-like characters to dashes.
            (
                    re.compile(u"Tom Xavier"),
                    lambda m : "XXXXXXX"
            ),
    
            # Then do an actual SSL regex.
            # See https://github.com/opendata/SSN-Redaction for why this regex is complicated.
            (
                    re.compile(r"(?

    Full Example can be found here

    In ruby https://github.com/gettalong/hexapdf works for black out text. Example code:

    require 'hexapdf'
    
    class ShowTextProcessor < HexaPDF::Content::Processor
    
      def initialize(page, to_hide_arr)
        super()
        @canvas = page.canvas(type: :overlay)
        @to_hide_arr = to_hide_arr
      end
    
      def show_text(str)
        boxes = decode_text_with_positioning(str)
        return if boxes.string.empty?
        if @to_hide_arr.include? boxes.string
            @canvas.stroke_color(0, 0 , 0)
    
            boxes.each do |box|
              x, y = *box.lower_left
              tx, ty = *box.upper_right
              @canvas.rectangle(x, y, tx - x, ty - y).fill
            end
        end
    
      end
      alias :show_text_with_positioning :show_text
    
    end
    
    file_name = ARGV[0]
    strings_to_black = ARGV[1].split("|")
    
    doc = HexaPDF::Document.open(file_name)
    puts "Blacken strings [#{strings_to_black}], inside [#{file_name}]."
    doc.pages.each.with_index do |page, index|
      processor = ShowTextProcessor.new(page, strings_to_black)
      page.process_contents(processor)
    end
    
    new_file_name = "#{file_name.split('.').first}_updated.pdf"
    doc.write(new_file_name, optimize: true)
    
    puts "Writing updated file [#{new_file_name}]."
    

    In this you can black out text on select text will be visible.

提交回复
热议问题