VBA - Convert string to UNICODE

前端 未结 2 1113
南方客
南方客 2020-12-01 20:21

I need to convert the string HTML from a mix of Cyrillic and Latin symbols to UNICODE.

I tried the following:

Public HTML As String
    Sub HTMLsearc         


        
相关标签:
2条回答
  • 2020-12-01 20:39

    VBA's support for Unicode is not all that great.

    It is possible to handle Unicode strings, but you will not be able to see the actual characters with Debug.Print or MsgBox - they will appear as ? there.

    You can set Control Panel > Region and Language > Administrative tab > "Current language for non-Unicode programs" to "Russian" switch to a different code page, which would allow you to see Cyrillic letters in VBA message boxes instead of question marks. But that's only a cosmetic change.


    Your real problem is something else here.

    The server (nfs.mobile.bg) sends the document as Content-Type: text/html. There is no information about character encoding. That means the receiver must figure out character encoding on its own.

    A browser does that by looking at the response byte stream and making guesses. In your case, a helpful <meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> tag is present in the HMTL source. Therefore, the byte stream should be interpreted as Windows-1251, which happens to be the Cyrillic ANSI code page in Windows.

    So, we do not even have Unicode here!

    In the absence of any additional info, the responseText property of the XMLHTTP object defaults to us-ascii. The extended characters from the Cyrillic alphabet are not present in ASCII, so they will be converted to actual question marks and are lost. That's why you can't use responseText for anything.

    However, the original bytes of the response are still available, in the responseBody property, which is an array of Byte.

    In VBA you must do the same thing a browser would do. You must interpret the byte-stream as a certain character set. The ADODB.Stream object can do that for you, and it's pretty straight-forward, too:

    ' reference: "Microsoft XML, v6.0" (or any other version)
    ' reference: "Microsoft ActiveX Data Objects 6.1 library" (or any other version)
    Option Explicit
    
    Sub HTMLsearch()
        Dim url As String, html As String
    
        url = "http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1"
        html = GetHTML(url, "Windows-1251")
    
        ' Cyrillic characters are supported in Office, so they will appear correctly
        ActiveDocument.Range.InsertAfter html
    End Sub
    
    Function GetHTML(Url As String, Optional Charset As String = "UTF-8") As String
        Dim request As New MSXML2.XMLHTTP
        Dim converter As New ADODB.stream
    
        ' fetch page
        request.Open "GET", Url, False
        request.send
    
        ' write raw bytes to the stream
        converter.Open
        converter.Type = adTypeBinary
        converter.Write request.responseBody
    
        ' switch the stream to text mode and set charset
        converter.Position = 0
        converter.Type = adTypeText
        converter.Charset = Charset
    
        ' read text characters from the stream, close the stream
        GetHTML = converter.ReadText
        converter.Close
    End Function
    

    I've been using MS Word here and calling HTMLsearch() properly writes Cyrillic characters to the page. They still do appear as ? in a MsgBox for me, though, but now that's purely a display problem, caused by the fact that VBA-created UI cannot deal with Unicode.

    0 讨论(0)
  • 2020-12-01 20:40

    My production order data comes from many countries. this is the only vba function I could find that really works.

    Private Const CP_UTF8 = 65001
    
    Private Declare Function MultiByteToWideChar Lib "kernel32" ( _
       ByVal CodePage As Long, ByVal dwFlags As Long, _
       ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, _
       ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
    
    
    Public Function sUTF8ToUni(bySrc() As Byte) As String
       ' Converts a UTF-8 byte array to a Unicode string
       Dim lBytes As Long, lNC As Long, lRet As Long
    
       lBytes = UBound(bySrc) - LBound(bySrc) + 1
       lNC = lBytes
       sUTF8ToUni = String$(lNC, Chr(0))
       lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(bySrc(LBound(bySrc))), lBytes, StrPtr(sUTF8ToUni), lNC)
       sUTF8ToUni = Left$(sUTF8ToUni, lRet)
    End Function
    

    Example Usage:

    Dim sHTML As String
    Dim bHTML() As Byte
    bHTML = GetHTML("http://yoururlhere/myorderdata.php")
    sHTML = sUTF8ToUni(bHTML)
    sHTML = Mid(sHTML, 2)  'strip off Byte Order Mark: EF BB BF
    
    0 讨论(0)
提交回复
热议问题