Character looks like ASCII 63 but isn't so I can't remove it

半城伤御伤魂 提交于 2019-12-10 17:33:26

问题


I'm reading text from a text file. The first string the text file has to read is "Algood ", and note the space. In Notepad, it appears that there is a space in this string, but it isn't. When I test the 6th (zero-based index) character in Visual Studio's QuickWatch, it appears as:

"�"c

When I use the Asc function to get the ASCII code, it tells me that the ASCII code is 63. 63 is a question mark. But when I test to see if the string contains ASCII 63, it tests false. So it appears that the string contains the character with the ASCII code 63, only it doesn't, it contains some other character which tests as ASCII code 63. This is a problem: I can't remove the character if I don't know what to call it. I could remove the last character, but not every string in the text file contains this character.

The question is: what is this character if not a question mark, and how can I uniquely identify so I can remove it?


回答1:


It is the Unicode replacement character, U+FFFD, aka ChrW(&HFFFD).

Never use Asc() or Chr(), they are legacy VB6 functions that do not handle Unicode. Passing a fancy Unicode codepoint to Asc() always produces 63, the character code for "?"c, aka "I have no idea what you're saying". The exact same idea as"�"c but using an ASCII code instead.

Seeing the Black Diamond of Death back is always bad news, something went wrong when the string was converted from the underlying byte values. Because some byte values did not produce a valid character. Which is what you really should be looking for, you always want to avoid GIGO. Garbage In Garbage Out is an ugly data corruption problem that has no winners, only victims. You.




回答2:


I have wrote the following function in Excel VBA which will remove the "black diamond" for a single cell.

The hardest thing is to not loop each digit in all field to find it. I needed a method to identify the black diamond without check all digits of all fields.

I used a ADODB recordset, if the string is not accepted by the RS, it means it contains an invalid character. Then it looks for a ASC(63) = “?”, then it trims the cell down to without the black diamond.

The reason this work is when it loops through each digit in the string, it will recognize the black diamond as ASC = 63. If is a real question mark, it will be accepted by the RS.

Private Function Correct_Black_Diamond(ByVal First_Address As Variant) As String
    Dim CheckDigit As Integer
    Dim Temp_string As String
    Dim temp_Rs As New ADODB.Recordset
        temp_Rs.Fields.Append "address", adChar, 9999
        temp_Rs.Open

        temp_Rs.AddNew
            On Error GoTo Further_Address_Check
            temp_Rs!Address = First_Address
        temp_Rs.Update

        Correct_Black_Diamond = First_Address
    Exit Function

Further_Address_Check:
        For CheckDigit = 1 To Len(First_Address)
            If Asc(Mid(First_Address, CheckDigit, 1)) = 63 Then
                Temp_string = Trim(Mid(First_Address, 1, CheckDigit - 1)) & Trim(Mid(First_Address, CheckDigit + 1, Len(First_Address)))
            End If
        Next CheckDigit
        First_Address = Temp_string
        Correct_Black_Diamond = First_Address
        Exit Function

End Function



回答3:


Use:

LDM_MSG.Replace(ChrW(8203), "") 

Instead of:

LDM_MSG.Replace(Chr(63), "")

It solves the problem.



来源:https://stackoverflow.com/questions/25838639/character-looks-like-ascii-63-but-isnt-so-i-cant-remove-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!