Dealing with eacute and other special characters using Oracle, PHP and Oci8

橙三吉。 提交于 2019-12-04 07:17:06

I presume you are aware of these facts:

  • There are many different character sets: you have to pick one and, of course, know which one you are using.
  • Oracle is perfectly capable of storing text without HTML entities (é). HTML entities are used in, well, HTML. Oracle is not a web browser ;-)

You must also know that HTML entities are not bind to a specific charset; on the contrary, they're used to represent characters in a charset-independent context.

You indistinctly talk about ISO-8859-1 and UTF-8. What charset do you want to use? ISO-8859-1 is easy to use but it can only store text in some latin languages (such as Spanish) and it lacks some common chars like the € symbol. UTF-8 is trickier to use but it can store all characters defined by the Unicode consortium (which include everything you'll ever need).

Once you've taken the decision, you must configure Oracle to hold data in such charset and choose an appropriate column type. E.g., VARCHAR2 is fine for plain ASCII, NVARCHAR2 is good for UTF-8.

This is what I finally ended up doing to solve this problem:

Modified the profile of the daemon running PHP to have:

NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

So that the oci8 connection uses ISO-8859-1.

Then in my PHP configuration set the default content-type to ISO-8859-1:

default_charset = "iso-8859-1"

When I am inserting into an Oracle Table via oci8 from PHP, I do:

utf8_decode($my_sent_value)

And when receiving data from Oracle, printing the variable should just work as so:

echo $my_received_value

However when sending that data over ajax I have had to use:

utf8_encode($my_received_value)

If you really cannot change the character set that oracle will use then how about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set which maps exactly to ISO-8859-1). Base64 encoding will increase the length of the string by, on average, 37%

If your data is only ever going to be displayed as HTML then you might as well store HTML entities as you suggested, but be aware that a single entity can be up to 10 characters per unencoded character e.g. ϑ is ϑ

Javier Campo

I had to face this problem : the LatinAmerican special characters are stored as "?" or "¿" in my Oracle database ... I can't change the NLS_CHARACTER_SET because we're not the database owners.

So, I found a workaround :

1) ASP.NET code Create a function that converts string to hexadecimal characters:

    public string ConvertirStringAHex(String input)
    {
        Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
        Byte[] stringBytes = encoding.GetBytes(input);
        StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
        foreach (byte b in stringBytes)
        {
            sbBytes.AppendFormat("{0:X2}", b);
        }
        return sbBytes.ToString();
    }

2) Apply the function above to the variable you want to encode, like this

     myVariableHex = ConvertirStringZHex( myVariable );

In ORACLE, use the following:

 PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
 IS
 BEGIN
   INSERT INTO myTable( SPECIAL_TEXT )  
   VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
   COMMIT;
 END;

Of course, iTEXTO is the Oracle parameter which receives the value of "myVariableHex" from ASP.NET code.

Hope it helps ... if there's something to improve pls don't hesitate to post your comments.

Sources: http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx https://forums.oracle.com/thread/44799

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!