How to write BOM marker to a file in Ruby

后端 未结 2 1148
感情败类
感情败类 2021-01-01 23:15

I have some working code with a crutch to add BOM marker to a new file.

  #writing
  File.open name, \'w\', 0644 do |file|
    file.write \"\\uFEFF\"
    fil         


        
相关标签:
2条回答
  • 2021-01-01 23:55

    Alas I think your manual approach is the way to go, at least I don't know a better way:

    http://blog.grayproductions.net/articles/miscellaneous_m17n_details

    To quote from JEG2's article:

    Ruby 1.9 won't automatically add a BOM to your data, so you're going to need to take care of that if you want one. Luckily, it's not too tough. The basic idea is just to print the bytes needed at the beginning of a file.

    0 讨论(0)
  • 2021-01-01 23:59

    **** This answer lead to a new gem: file_with_bom ****

    I had the similar problem in the past and I extended File.open with additional encoding variants for the w-mode:

    class File
      BOM_LIST_hex = {
          Encoding::UTF_8      => "\xEF\xBB\xBF", #"\uEFBBBF"
          Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
          Encoding::UTF_16LE => "\xFF\xFE",
          Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
          Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
        }
      BOM_LIST_hex.freeze
      def utf_bom_hex(encoding = external_encoding)
        BOM_LIST_hex[encoding]
      end
    
    class << self
      alias :open_old :open
      def open(filename, mode_string = 'r', options = {}, &block)
        #check for bom-flag in mode_string
        options[:bom] = true if mode_string.sub!(/-bom/i,'')
    
        f = open_old(filename, mode_string, options)
        if options[:bom]
          case mode_string
            #r|bom already standard since 1.9.2
            when /\Ar/   #read mode -> remove BOM
              #remove BOM
              bom = f.read(f.utf_bom_hex.bytesize) 
              #check, if it was really a bom
              if bom != f.utf_bom_hex.force_encoding(bom.encoding)
                f.rewind  #return to position 0 if BOM was no BOM
              end
            when /\Aw/  #write mode -> attach BOM
              f = open_old(filename, mode_string, options)
              f << f.utf_bom_hex.force_encoding(f.external_encoding)
            end #mode_string
        end
    
        if block_given?
          yield f 
          f.close
        end
      end
      end
    end #File
    

    Testcode:

    EXAMPLE_TEXT = 'some content öäü'
    File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
    File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
    File.open("file_utf16le.txt", "r:utf-16le:utf-8",  :bom => true ){|f| p f.read }
    File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }
    
    File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
    File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
    File.open("file_utf8.txt", "r:utf-8|bom",              ){|f| p f.read }
    File.open("file_utf8.txt", "r:utf-8",                     ){|f| p f.read }
    

    Some remarks:

    • The code is from pre 1.9-times (but it still works).
    • I used -bom as a bom indicator (ruby 1.9 uses |bom.

    Some needed fixes to be better:

    • use |bom instead -bom
    • use the standard r|bom for reading
    • make it ruby 1.8 and 1.9 enabled

    Perhaps I will find some time tomorrow to refactor my code and provide it as a gem.

    0 讨论(0)
提交回复
热议问题