问题
I'm reading text from a text file. The first string the text file has to read is "Algood ", and note the space. In Notepad, it appears that there is a space in this string, but it isn't. When I test the 6th (zero-based index) character in Visual Studio's QuickWatch, it appears as:
"�"c
When I use the Asc
function to get the ASCII code, it tells me that the ASCII code is 63. 63 is a question mark. But when I test to see if the string contains ASCII 63, it tests false. So it appears that the string contains the character with the ASCII code 63, only it doesn't, it contains some other character which tests as ASCII code 63. This is a problem: I can't remove the character if I don't know what to call it. I could remove the last character, but not every string in the text file contains this character.
The question is: what is this character if not a question mark, and how can I uniquely identify so I can remove it?
回答1:
It is the Unicode replacement character, U+FFFD, aka ChrW(&HFFFD)
.
Never use Asc() or Chr(), they are legacy VB6 functions that do not handle Unicode. Passing a fancy Unicode codepoint to Asc() always produces 63, the character code for "?"c
, aka "I have no idea what you're saying". The exact same idea as"�"c
but using an ASCII code instead.
Seeing the Black Diamond of Death back is always bad news, something went wrong when the string was converted from the underlying byte values. Because some byte values did not produce a valid character. Which is what you really should be looking for, you always want to avoid GIGO. Garbage In Garbage Out is an ugly data corruption problem that has no winners, only victims. You.
回答2:
I have wrote the following function in Excel VBA which will remove the "black diamond" for a single cell.
The hardest thing is to not loop each digit in all field to find it. I needed a method to identify the black diamond without check all digits of all fields.
I used a ADODB recordset, if the string is not accepted by the RS, it means it contains an invalid character. Then it looks for a ASC(63) = “?”, then it trims the cell down to without the black diamond.
The reason this work is when it loops through each digit in the string, it will recognize the black diamond as ASC = 63. If is a real question mark, it will be accepted by the RS.
Private Function Correct_Black_Diamond(ByVal First_Address As Variant) As String
Dim CheckDigit As Integer
Dim Temp_string As String
Dim temp_Rs As New ADODB.Recordset
temp_Rs.Fields.Append "address", adChar, 9999
temp_Rs.Open
temp_Rs.AddNew
On Error GoTo Further_Address_Check
temp_Rs!Address = First_Address
temp_Rs.Update
Correct_Black_Diamond = First_Address
Exit Function
Further_Address_Check:
For CheckDigit = 1 To Len(First_Address)
If Asc(Mid(First_Address, CheckDigit, 1)) = 63 Then
Temp_string = Trim(Mid(First_Address, 1, CheckDigit - 1)) & Trim(Mid(First_Address, CheckDigit + 1, Len(First_Address)))
End If
Next CheckDigit
First_Address = Temp_string
Correct_Black_Diamond = First_Address
Exit Function
End Function
回答3:
Use:
LDM_MSG.Replace(ChrW(8203), "")
Instead of:
LDM_MSG.Replace(Chr(63), "")
It solves the problem.
来源:https://stackoverflow.com/questions/25838639/character-looks-like-ascii-63-but-isnt-so-i-cant-remove-it