I have a C#.Net application that accesses data from a commercial application backed by an Oracle 10 db. A couple of fields in the commercial app\'s database (declared as va
Certain characters in the WE8ISO8859P1 character set have a different binary representation than the same character in UTF8.
What I suggest are 2 possible ways
1) Try using Oracle native data providers for .NET (ODP.NET). May be there is a bug/feature in Microsoft's library System.Data.OracleClient that this adapter do not automatically support converting WE8ISO8859P1 to unicode. Here is a link to ODP.NET
I hope that there will be a support for this encoding in ODP (but to say true I never checked this, it is only a suggestion)
2) Workaround: in Dataset, you should create a binary field (mapped to the original table field) and a String field (not mapped to the database). When you load data to the dataset, iterate for each row and perfrom convertion from binary array to string.
Code should be something like this
Encoding e = Encoding.GetEncoding("iso-8859-1");
foreach(DataRow row in dataset.Tables["MyTable"])
{
if (!row.IsNull("MyByteArrayField"))
row["MyStringField"] = e.GetString((row["MyByteArrayField"] as byte[]));
}
Postscript for anyone browsing this thread:
Bogdan was very helpful in getting me to the "answer" (such as it is) but as he points out, you might not have identical circumstances.
We communicated with the team responsible for using the commercial software. They had been copying/pasting from Word and Excel, which is how the special characters had been getting inserted.
The problem occurred in the translation of the character between the remote database and our database. Host database uses character set WE8ISO8859P1, where ours uses WE8MSWIN1252. Due to corporate-level concerns, modifying either character set is not feasible right now.
I used SYS.UTL_RAW.CAST_TO_RAW(fieldname) to convert the source field to search for 'BF' (the hex code for an inverted question mark in our character set). This at least let me identify the problem record and character. HOWEVER, many different special characters on the remote records would/could be translated to BF. For example, Word's hyphens are not simple "dash" characters, and also get translated to the inverted question mark.
dump(fieldname) somehow converts to decimal character codes BEFORE the translation, UNLESS I also used the SYS.UTL_RAW.CAST_TO_RAW in the same query. This caused amazing headaches. dump() by itself could be useful in identifying specific pretranslated characters from the source db.
Best solution would be to use the same character set on both dbs. Since that's not possible for us, we have manually replaced all occurrences of the special character on the source (remote) db with non-special equivalents (regular apostrophe or hyphen). However, since the commercial software doesn't correct or flag special characters, we may run into this problem in the future. So, our update application will scan for the inverted question mark and send a notification to the system owner with the ID of the bad record. This, like so many other corporate situations, will have to do. ;-)
Thanks again, Bogdan!