Using nzload to load special characters

后端 未结 1 1492
孤街浪徒
孤街浪徒 2021-01-27 02:24

I have extended ascii chars in Oracle table data which I am able extract to a file using sqlplus with the \\ escape character prefixed. I want to use nzload to load the exact sa

1条回答
  •  后悔当初
    2021-01-27 03:02

    I'm not very savvy with Unicode conversion issues, but I've done this to myself before, and I'll demonstrate what I think is happening.

    I believe what you are seeing here is not an issue with loading special characters with nzload, rather it's an issue with how your display/terminal software is display the data and/or Netezza how is storing the character data. I suspect a double conversion to/from UTF-8 (the Unicode encoding that Netezza supports). Let's see if we can suss out which it is.

    Here I am using PuTTY with the default (for me) Remote Character Set as Latin-1.

    $ od -xa input.txt
    0000000    5250    464f    5345    4953    4e4f    4c41    bfc2    000a
              P   R   O   F   E   S   S   I   O   N   A   L   B   ?  nl
    0000017
    
    $ cat input.txt
    PROFESSIONAL¿
    

    Here we can see from od that the file has only the data we expect, however when we cat the file we see the extra character. If it's not in the file, then the character is likely coming from the display translation.

    If I change the PuTTY settings to have UTF-8 be the remote character set, we see it this way:

    $ od -xa input.txt
    0000000    5250    464f    5345    4953    4e4f    4c41    bfc2    000a
              P   R   O   F   E   S   S   I   O   N   A   L   B   ?  nl
    0000017
    $ cat input.txt
    PROFESSIONAL¿
    

    So, the same source data, but two different on-screen representations, which are, not coincidentally, the same as your two different outputs. The same data can be displayed at least two ways.

    Now let's see how it loads into Netezza, once into a VARCHAR column, and again into an NVARCHAR column.

    create table test_enc_vchar (col1 varchar(50));
    create table test_enc_nvchar (col1 nvarchar(50));
    
    $ nzload -db testdb -df input.txt -t test_enc_vchar -escapechar '\' -ctrlchars
    Load session of table 'TEST_ENC_VCHAR' completed successfully
    $ nzload -db testdb -df input.txt -t test_enc_nvchar -escapechar '\' -ctrlchars
    Load session of table 'TEST_ENC_NVCHAR' completed successfully
    

    The data loaded with no errors. Note while I specify the escapechar option for nzload, none of the characters in this specific sample of input data require escaping, nor are they escaped.

    I will now use the rawtohex function from the SQL Extension Toolkit as an in-database tool like we've used od from the command line.

    select rawtohex(col1) from test_enc_vchar;
               RAWTOHEX
    ------------------------------
     50524F46455353494F4E414CC2BF
    (1 row)
    
    select rawtohex(col1) from test_enc_nvchar;
               RAWTOHEX
    ------------------------------
     50524F46455353494F4E414CC2BF
    (1 row)
    

    At this point both columns seem to have exactly the same data as the input file. So far, so good.

    What if we select the column? For the record, I am doing this in a PuTTY session with remote character set of UTF-8.

    select col1 from test_enc_vchar;
          COL1
    ----------------
     PROFESSIONAL¿
    (1 row)
    
    select col1 from test_enc_nvchar;
         COL1
    ---------------
     PROFESSIONAL¿
    (1 row)
    

    Same binary data, but different display. If I then copy the output of each of those selects into echo piped to od,

    $ echo PROFESSIONAL¿ | od -xa
    0000000    5250    464f    5345    4953    4e4f    4c41    82c3    bfc2
              P   R   O   F   E   S   S   I   O   N   A   L   C stx   B   ?
    0000020    000a
             nl
    0000021
    
    $ echo  PROFESSIONAL¿ | od -xa
    0000000    5250    464f    5345    4953    4e4f    4c41    bfc2    000a
              P   R   O   F   E   S   S   I   O   N   A   L   B   ?  nl
    0000017
    

    Based on this output, I'd wager that you are loading your sample data, which I'd also wager is UTF-8, into a VARCHAR column rather than an NVARCHAR column. This is not, in of itself, a problem, but can have display/conversion issues down the line.

    Generally speaking, you'd want to load UTF-8 data into NVARCHAR columns.

    0 讨论(0)
提交回复
热议问题