I have some working code with a crutch to add BOM marker to a new file.
File.open name, \'w\', 0644 do |file|
file.write \"\\uFEFF\"
Alas I think your manual approach is the way to go, at least I don't know a better way:
To quote from JEG2's article:
Ruby 1.9 won't automatically add a BOM to your data, so you're going to need to take care of that if you want one. Luckily, it's not too tough. The basic idea is just to print the bytes needed at the beginning of a file.
**** This answer lead to a new gem: file_with_bom ****
I had the similar problem in the past and I extended File.open
with additional encoding variants for the w
class File
BOM_LIST_hex = {
Encoding::UTF_8 => "\xEF\xBB\xBF", #"\uEFBBBF"
Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
Encoding::UTF_16LE => "\xFF\xFE",
Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
def utf_bom_hex(encoding = external_encoding)
class << self
alias :open_old :open
def open(filename, mode_string = 'r', options = {}, &block)
#check for bom-flag in mode_string
options[:bom] = true if mode_string.sub!(/-bom/i,'')
f = open_old(filename, mode_string, options)
if options[:bom]
case mode_string
#r|bom already standard since 1.9.2
when /\Ar/ #read mode -> remove BOM
#remove BOM
bom = f.read(f.utf_bom_hex.bytesize)
#check, if it was really a bom
if bom != f.utf_bom_hex.force_encoding(bom.encoding)
f.rewind #return to position 0 if BOM was no BOM
when /\Aw/ #write mode -> attach BOM
f = open_old(filename, mode_string, options)
f << f.utf_bom_hex.force_encoding(f.external_encoding)
end #mode_string
if block_given?
yield f
end #File
EXAMPLE_TEXT = 'some content öäü'
File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }
File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8|bom", ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8", ){|f| p f.read }
Some remarks:
as a bom indicator (ruby 1.9 uses |bom
.Some needed fixes to be better:
instead -bom
for readingPerhaps I will find some time tomorrow to refactor my code and provide it as a gem.