Finding and removing non ascii characters from an Oracle Varchar2

前端未结

关注

 17  2129

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these reco

相关标签:

17条回答

一整个雨季

2020-12-02 23:49
Do this, it will work.
```
trim(replace(ntwk_slctor_key_txt, chr(0), ''))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

小鲜肉

2020-12-02 23:52

The following also works:

select dump(a,1016), a from (
SELECT REGEXP_REPLACE (
          CONVERT (
             '3735844533120%$03  ',
             'US7ASCII',
             'WE8ISO8859P1'),
          '[^!@/\.,;:<>#$%&()_=[:alnum:][:blank:]]') a
  FROM DUAL);

0 讨论(0)

小鲜肉

2020-12-02 23:54

There's probably a more direct way using regular expressions. With luck, somebody else will provide it. But here's what I'd do without needing to go to the manuals.

Create a PLSQL function to receive your input string and return a varchar2.

In the PLSQL function, do an asciistr() of your input. The PLSQL is because that may return a string longer than 4000 and you have 32K available for varchar2 in PLSQL.

That function converts the non-ASCII characters to \xxxx notation. So you can use regular expressions to find and remove those. Then return the result.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-02 23:54
Please note that whenever you use
```
regexp_like(column, '[A-Z]')
```
Oracle's regexp engine will match certain characters from the Latin-1 range as well: this applies to all characters that look similar to ASCII characters like Ä->A, Ö->O, Ü->U, etc., so that [A-Z] is not what you know from other environments like, say, Perl.

Instead of fiddling with regular expressions try changing for the NVARCHAR2 datatype prior to character set upgrade.

Another approach: instead of cutting away part of the fields' contents you might try the SOUNDEX function, provided your database contains European characters (i.e. Latin-1) characters only. Or you just write a function that translates characters from the Latin-1 range into similar looking ASCII characters, like
- å => a
- ä => a
- ö => o
of course only for text blocks exceeding 4000 bytes when transformed to UTF-8.
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-02 23:55
If you use the ASCIISTR function to convert the Unicode to literals of the form \nnnn, you can then use REGEXP_REPLACE to strip those literals out, like so...
```
UPDATE table SET field = REGEXP_REPLACE(ASCIISTR(field), '\\[[:xdigit:]]{4}', '')
```
...where field and table are your field and table names respectively.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3