UCS-2 Little Endian to UTF-8 conversion leaves file with many unwanted characters

大城市里の小女人 提交于 2019-12-20 04:53:08

问题


I have a script that I put together after going over many different ways that I could do an encoding conversion using ADODB in VBScript.

Option Explicit

Sub UTFConvert()
    Dim objFSO, objStream, file

    file = "FileToConvert.csv"

    Set objStream = CreateObject( "ADODB.Stream" )
    objStream.Open
    objStream.Type = 2
    objStream.Position = 0
    objStream.Charset = "utf-8"
    objStream.LoadFromFile file
    objStream.SaveToFile file, 2
    objStream.Close
    Set objStream = Nothing
End Sub

UTFConvert

The file is supposed to be converted from UCS-2 Little Endian, or whichever readable format it is in (within limitations), to UTF-8. The issue however is that once this file has finished converting to UTF-8 there are many NUL symbols throughout the entire file before and after every letter, and xFF xFE (UCS-2 LE BOM) at the start of the file. These are visible without needing to use any symbol visualization toggles. Any help would be appreciated in understanding where I may be limited with this conversion. Or any alternative approach I can take.


回答1:


Your Stream object is loading the file as an UTF-8 encoded file, thus misinterpreting the byte sequences. Read the file using a FileSystemObject instance and write it with the ADODB.Stream object:

Sub UTFConvert(filename)
  Set fso = CreateObject("Scripting.FileSystemObject")
  txt = fso.OpenTextFile(filename, 1, False, -1).ReadAll

  Set stream = CreateObject("ADODB.Stream")
  stream.Open
  stream.Type     = 2 'text
  stream.Position = 0
  stream.Charset  = "utf-8"
  stream.WriteText txt
  stream.SaveToFile filename, 2
  stream.Close
End Sub


来源:https://stackoverflow.com/questions/32343659/ucs-2-little-endian-to-utf-8-conversion-leaves-file-with-many-unwanted-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!