I have extended ascii chars in Oracle table data which I am able extract to a file using sqlplus with the \\ escape character prefixed. I want to use nzload to load the exact sa
I'm not very savvy with Unicode conversion issues, but I've done this to myself before, and I'll demonstrate what I think is happening.
I believe what you are seeing here is not an issue with loading special characters with nzload, rather it's an issue with how your display/terminal software is display the data and/or Netezza how is storing the character data. I suspect a double conversion to/from UTF-8 (the Unicode encoding that Netezza supports). Let's see if we can suss out which it is.
Here I am using PuTTY with the default (for me) Remote Character Set as Latin-1.
$ od -xa input.txt
0000000 5250 464f 5345 4953 4e4f 4c41 bfc2 000a
P R O F E S S I O N A L B ? nl
0000017
$ cat input.txt
PROFESSIONAL¿
Here we can see from od that the file has only the data we expect, however when we cat the file we see the extra character. If it's not in the file, then the character is likely coming from the display translation.
If I change the PuTTY settings to have UTF-8 be the remote character set, we see it this way:
$ od -xa input.txt
0000000 5250 464f 5345 4953 4e4f 4c41 bfc2 000a
P R O F E S S I O N A L B ? nl
0000017
$ cat input.txt
PROFESSIONAL¿
So, the same source data, but two different on-screen representations, which are, not coincidentally, the same as your two different outputs. The same data can be displayed at least two ways.
Now let's see how it loads into Netezza, once into a VARCHAR column, and again into an NVARCHAR column.
create table test_enc_vchar (col1 varchar(50));
create table test_enc_nvchar (col1 nvarchar(50));
$ nzload -db testdb -df input.txt -t test_enc_vchar -escapechar '\' -ctrlchars
Load session of table 'TEST_ENC_VCHAR' completed successfully
$ nzload -db testdb -df input.txt -t test_enc_nvchar -escapechar '\' -ctrlchars
Load session of table 'TEST_ENC_NVCHAR' completed successfully
The data loaded with no errors. Note while I specify the escapechar option for nzload, none of the characters in this specific sample of input data require escaping, nor are they escaped.
I will now use the rawtohex function from the SQL Extension Toolkit as an in-database tool like we've used od from the command line.
select rawtohex(col1) from test_enc_vchar;
RAWTOHEX
------------------------------
50524F46455353494F4E414CC2BF
(1 row)
select rawtohex(col1) from test_enc_nvchar;
RAWTOHEX
------------------------------
50524F46455353494F4E414CC2BF
(1 row)
At this point both columns seem to have exactly the same data as the input file. So far, so good.
What if we select the column? For the record, I am doing this in a PuTTY session with remote character set of UTF-8.
select col1 from test_enc_vchar;
COL1
----------------
PROFESSIONAL¿
(1 row)
select col1 from test_enc_nvchar;
COL1
---------------
PROFESSIONAL¿
(1 row)
Same binary data, but different display. If I then copy the output of each of those selects into echo piped to od,
$ echo PROFESSIONAL¿ | od -xa
0000000 5250 464f 5345 4953 4e4f 4c41 82c3 bfc2
P R O F E S S I O N A L C stx B ?
0000020 000a
nl
0000021
$ echo PROFESSIONAL¿ | od -xa
0000000 5250 464f 5345 4953 4e4f 4c41 bfc2 000a
P R O F E S S I O N A L B ? nl
0000017
Based on this output, I'd wager that you are loading your sample data, which I'd also wager is UTF-8, into a VARCHAR column rather than an NVARCHAR column. This is not, in of itself, a problem, but can have display/conversion issues down the line.
Generally speaking, you'd want to load UTF-8 data into NVARCHAR columns.