I need to generate a customized PDF copy of a template document. The easiest way - I thought - was to create a source PDF that has some placeholder text where customization need
There is no definite solution but I found 2 solutions that works most of the time.
In python https://github.com/JoshData/pdf-redactor gives good results. Here is the example code:
# Redact things that look like social security numbers, replacing the
# text with X's.
options.content_filters = [
# First convert all dash-like characters to dashes.
(
re.compile(u"Tom Xavier"),
lambda m : "XXXXXXX"
),
# Then do an actual SSL regex.
# See https://github.com/opendata/SSN-Redaction for why this regex is complicated.
(
re.compile(r"(?
Full Example can be found here
In ruby https://github.com/gettalong/hexapdf works for black out text. Example code:
require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page, to_hide_arr)
super()
@canvas = page.canvas(type: :overlay)
@to_hide_arr = to_hide_arr
end
def show_text(str)
boxes = decode_text_with_positioning(str)
return if boxes.string.empty?
if @to_hide_arr.include? boxes.string
@canvas.stroke_color(0, 0 , 0)
boxes.each do |box|
x, y = *box.lower_left
tx, ty = *box.upper_right
@canvas.rectangle(x, y, tx - x, ty - y).fill
end
end
end
alias :show_text_with_positioning :show_text
end
file_name = ARGV[0]
strings_to_black = ARGV[1].split("|")
doc = HexaPDF::Document.open(file_name)
puts "Blacken strings [#{strings_to_black}], inside [#{file_name}]."
doc.pages.each.with_index do |page, index|
processor = ShowTextProcessor.new(page, strings_to_black)
page.process_contents(processor)
end
new_file_name = "#{file_name.split('.').first}_updated.pdf"
doc.write(new_file_name, optimize: true)
puts "Writing updated file [#{new_file_name}]."
In this you can black out text on select text will be visible.