I have an Oracle database that stores some data values in Simplified Chinese. I have created an ASP.net MVC C# webpage that is supposed to display this information. I am usi
Problems with character set are quite common, let me try to give some general notes.
In principle you have to consider four different character set settings.
NLS_CHARACTERSET
and NLS_NCHAR_CHARACTERSET
Example: AL32UTF8
They are defined only on your database, you can interrogate them with
SELECT *
FROM V$NLS_PARAMETERS
WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
These settings define which characters (in which format) can be stored in your database - no more, no less. It requires some effort (see Character Set Migration and/or Oracle Database Migration Assistant for Unicode) if you have to change it on existing database.
NLS_LANG
Example: AMERICAN_AMERICA.AL32UTF8
This value is defined only on your client. NLS_LANG has nothing to do with the ability to store characters in a database. It is used to let Oracle know what character set you are using on the client side. When you set NLS_LANG value (for example to AL32UTF8) then you just tell the Oracle database "my client uses character set AL32UTF8" - it does not necessarily mean that your client is really using AL32UTF8! (see below #4)
NLS_LANG can be defined by environment variable NLS_LANG
or by Windows Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG
(for 32 bit), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG
(for 64 bit). Depending on your application there might be other ways to specify NLS_LANG, but let's stick to the basics. If NLS_LANG value is not provided then Oracle defaults it to AMERICAN_AMERICA.US7ASCII
Format of NLS_LANG is NLS_LANG=language_territory.charset
. The {charset} part of NLS_LANG is not shown in any system table or view. All components of the NLS_LANG definition are optional, so following definitions are all valid: NLS_LANG=.WE8ISO8859P1
, NLS_LANG=_GERMANY
, NLS_LANG=AMERICAN
, NLS_LANG=ITALIAN_.WE8MSWIN1252
, NLS_LANG=_BELGIUM.US7ASCII
.
As stated above the {charset} part of NLS_LANG
is not available in database at any system table/view or any function. Strictly speaking this is true, however you can run this query:
SELECT DISTINCT CLIENT_CHARSET
FROM V$SESSION_CONNECT_INFO
WHERE (SID, SERIAL#) = (SELECT SID, SERIAL# FROM v$SESSION WHERE AUDSID = USERENV('SESSIONID'));
It should return character set from your current NLS_LANG
setting - however based on my experience the value is often NULL or Unknown
, i.e. not reliable.
Find more very useful information here: NLS_LANG FAQ
Note, some technologies do not utilize NLS_LANG
, settings there do not have any effect, for example:
ODP.NET Managed Driver is not NLS_LANG
sensitive. It is only .NET locale sensitive. (see Data Provider for .NET Developer's Guide)
OraOLEDB (from Oracle) always use UTF-16 (see OraOLEDB Provider Specific Features)
Java based JDBC (for example SQL Developer) has its own methods to deal with character sets (see Database JDBC Developer's Guide - Globalization Support for further details)
.sql
filesExample: UTF-8
If you work on a Windows terminal (i.e. with SQL*plus) you can interrogate the code page with command chcp
, on Unix/Linux the equivalent is locale charmap
or echo $LANG
. You can get a list of all Windows code pages identifiers from here: Code Page Identifiers. Note, for UTF-8 (chcp 65001
) there are some issues, see this discussion.
If you work with .sql
files and an editor like TOAD or SQL-Developer you have to check the save options. Usually you can choose values like UTF-8
, ANSI
, ISO-8859-1
, etc.
ANSI
means the Windows ANSI codepage, typically CP1252
, you can check in your Registry at HKLM\SYSTEM\ControlSet001\Control\Nls\CodePage\ACP
or here: National Language Support (NLS) API Reference
[Microsoft removed this reference, take it form web-archive National Language Support (NLS) API Reference]
The most important point is to match NLS_LANG
and your "real" character set of your terminal, resp. application or the encoding of your .sql
files
Some common pairs are:
CP850 -> WE8PC850
CP1252 or ANSI (in case of "Western" PC) -> WE8MSWIN1252
ISO-8859-1 -> WE8ISO8859P1
ISO-8859-15 -> WE8ISO8859P15
UTF-8 -> AL32UTF8
Or run this query to get some more:
SELECT VALUE AS ORACLE_CHARSET, UTL_I18N.MAP_CHARSET(VALUE) AS IANA_NAME
FROM V$NLS_VALID_VALUES
WHERE PARAMETER = 'CHARACTERSET';
Some technologies make you life easier, e.g. ODP.NET (unmanged driver) or ODBC driver from Oracle automatically inherits the character set from NLS_LANG
value, so condition from above is always true.
Is it required to set client NLS_LANG value equal to database NLS_CHARACTERSET
value?
No, not necessarily! For example, if you have the database character set NLS_CHARACTERSET=AL32UTF8
and the client character set NLS_LANG=.ZHS32GB18030
then it will work without any problem (provided your client really uses GB18030), although these character sets are completely different. GB18030 is a character set commonly used for Chinese, like UTF-8
it supports all Unicode characters.
If you have, for example NLS_CHARACTERSET=AL32UTF8
and NLS_LANG=.WE8ISO8859P1
it will also work (again, provided your client really uses ISO-8859-P1). However, the database may store characters which your client is not able to display, instead the client will display a placeholder (e.g. ¿
).
Anyway, it is beneficial to have matching NLS_LANG and NLS_CHARACTERSET values, if suitable. If they are equal you can be sure that any character which may be stored in database can also be displayed and any character you enter in your terminal or write in your .sql file can also be stored in database and is not substituted by placeholder.
So many times you can read advise like "The NLS_LANG character set must be the same as your database character set" (also here on SO). This is simply not true and a popular myth!
Here is the proof:
C:\>set NLS_LANG=.AL32UTF8
C:\>sqlplus ...
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 CharSet VARCHAR2(20);
3 BEGIN
4 SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
5 DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
6 IF UNISTR('\20AC') = '€' THEN
7 DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
8 ELSE
9 DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
10 END IF;
11 END;
12 /
Database NLS_CHARACTERSET is AL32UTF8
"€" is not the same as U+20AC
PL/SQL procedure successfully completed.
Both, client and database character sets are AL32UTF8
, however the characters do not match. The reason is, my cmd.exe
and thus also SQL*Plus use Windows CP1252. Therefore I must set NLS_LANG accordingly:
C:\>chcp
Active code page: 1252
C:\>set NLS_LANG=.WE8MSWIN1252
C:\>sqlplus ...
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 CharSet VARCHAR2(20);
3 BEGIN
4 SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
5 DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
6 IF UNISTR('\20AC') = '€' THEN
7 DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
8 ELSE
9 DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
10 END IF;
11 END;
12 /
Database NLS_CHARACTERSET is AL32UTF8
"€" is equal to U+20AC
PL/SQL procedure successfully completed.
Also consider this example:
CREATE TABLE ARABIC_LANGUAGE (
LANG_CHAR VARCHAR2(20),
LANG_NCHAR NVARCHAR2(20));
INSERT INTO ARABIC_LANGUAGE VALUES ('العربية', 'العربية');
You would need to set two different values for NLS_LANG
for a single statement - which is not possible.