Why do I get an “Invalid Byte Sequence in UTF-8” error reading a text file?

前端 未结 5 1575
刺人心
刺人心 2021-01-26 18:26

I\'m writing a Ruby script to process a large text file, and keep getting an odd encoding error. Here\'s the situation:

input_data = File.new(in_path, \'r\').rea         


        
5条回答
  •  一向
    一向 (楼主)
    2021-01-26 19:25

    Obviously your input file is not UTF-8 (or at least, not entirely). If you don't care about non-ascii characters, you can simply assume your file is ascii-8bit encoded. BTW, your separator (break_char) is not causing problems as comma is encoded the same way in UTF-8 as in ASCII.

    fname = 'test.in'
    
    # create example file and fill it with invalid UTF-8 sequence
    File.open(fname, 'w') do |f|
      f.write "\xc3\x28"
    end
    
    # then try to read and parse it
    s = File.open(fname) do |f| # file opened as UTF-8
    #s = File.open(fname, 'r:ascii-8bit') do |f| # file opened as ascii-8bit
      f.read
    end
    p s.split ','
    

提交回复
热议问题