Finding and removing non ascii characters from an Oracle Varchar2

前端 未结 17 2127
猫巷女王i
猫巷女王i 2020-12-02 23:03

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these reco

相关标签:
17条回答
  • 2020-12-02 23:28

    I think this will do the trick:

    SELECT REGEXP_REPLACE(COLUMN, '[^[:print:]]', '')
    
    0 讨论(0)
  • 2020-12-02 23:35

    I had similar requirement (to avoid this ugly ORA-31061: XDB error: special char to escaped char conversion failed. ), but had to keep the line breaks.

    I tried this from an excellent comment

    '[^ -~|[:space:]]'
    

    but got this ORA-12728: invalid range in regular expression .

    but it lead me to my solution:

    select t.*, regexp_replace(deta, '[^[:print:]|[:space:]]', '#') from  
        (select '-   <- strangest thing here, and I want to keep line break after
    -' deta from dual ) t
    

    displays (in my TOAD tool) as

    • replace all that ^ => is not in the sets (of printing [:print:] or space |[:space:] chars)
    0 讨论(0)
  • 2020-12-02 23:36

    Try the following:

    -- To detect
    select 1 from dual
    where regexp_like(trim('xx test text æ¸¬è© ¦ “xmx” number²'),'['||chr(128)||'-'||chr(255)||']','in')
    
    -- To strip out
    select regexp_replace(trim('xx test text æ¸¬è© ¦ “xmxmx” number²'),'['||chr(128)||'-'||chr(255)||']','',1,0,'in')
    from dual
    
    0 讨论(0)
  • 2020-12-02 23:40

    Answer given by Francisco Hayoz is the best. Don't use pl/sql functions if sql can do it for you.

    Here is the simple test in Oracle 11.2.03

    select s
         , regexp_replace(s,'[^'||chr(1)||'-'||chr(127)||']','') "rep ^1-127"
         , dump(regexp_replace(s,'['||chr(127)||'-'||chr(225)||']','')) "rep 127-255"
    from (
    select listagg(c, '') within group (order by c) s
      from (select 127+level l,chr(127+level) c from dual connect by level < 129))
    

    And "rep 127-255" is

    Typ=1 Len=30: 226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

    i.e for some reason this version of Oracle does not replace char(226) and above. Using '['||chr(127)||'-'||chr(225)||']' gives the desired result. If you need to replace other characters just add them to the regex above or use nested replace|regexp_replace if the replacement is different then '' (null string).

    0 讨论(0)
  • 2020-12-02 23:41

    The select may look like the following sample:

    select nvalue from table
    where length(asciistr(nvalue))!=length(nvalue)  
    order by nvalue;
    
    0 讨论(0)
  • 2020-12-02 23:44

    I found the answer here:

    http://www.squaredba.com/remove-non-ascii-characters-from-a-column-255.html

    CREATE OR REPLACE FUNCTION O1DW.RECTIFY_NON_ASCII(INPUT_STR IN VARCHAR2)
    RETURN VARCHAR2
    IS
    str VARCHAR2(2000);
    act number :=0;
    cnt number :=0;
    askey number :=0;
    OUTPUT_STR VARCHAR2(2000);
    begin
    str:=’^'||TO_CHAR(INPUT_STR)||’^';
    cnt:=length(str);
    for i in 1 .. cnt loop
    askey :=0;
    select ascii(substr(str,i,1)) into askey
    from dual;
    if askey < 32 or askey >=127 then
    str :=’^'||REPLACE(str, CHR(askey),”);
    end if;
    end loop;
    OUTPUT_STR := trim(ltrim(rtrim(trim(str),’^'),’^'));
    RETURN (OUTPUT_STR);
    end;
    /
    

    Then run this to update your data

    update o1dw.rate_ipselect_p_20110505
    set NCANI = RECTIFY_NON_ASCII(NCANI);
    
    0 讨论(0)
提交回复
热议问题