What is the character set if default_charset is empty

前端 未结 2 1490
伪装坚强ぢ
伪装坚强ぢ 2021-01-19 05:34

In PHP 5.6 onwards the default_charset string is set to \"UTF-8\" as explained e.g. in the php.ini documentation. It says that the string is empty

相关标签:
2条回答
  • 2021-01-19 05:38

    It seems you should not rely on the internal encoding. The internal character encoding can be seen/set with mb_internal_encoding.

    example phpinfo()

    • PHP Version 5.5.9-1ubuntu4.5
    • default_charset no value

    file1.php

    <?php
    $string = "e";
    echo mb_internal_encoding(); //ISO-8859-1
    

    file2.php

    <?php
    $string = "É";
    echo mb_internal_encoding(); //ISO-8859-1
    

    both files will output ISO-8859-1 if you do not change the internal encoding manually.

    <?php
    echo bin2hex("ö"); //c3b6 (utf-8)
    

    Getting the hex of this character returns UTF-8 encoding. If you save the file using UTF-8 the string in this example will have 2 bytes, even if the internal encoding is not set to UTF-8. Therefore you should rely on the character encoding used for the source file.

    0 讨论(0)
  • 2021-01-19 06:04

    Short answer

    For literal strings -- always source file encoding. default_charset value does nothing here.

    Longer answer

    PHP strings are "binary safe" meaning they do not have any internal string encoding. Basically string in PHP are just buffers of bytes.

    For literal strings e.g. $s = "Ä" this means that string will contain whatever bytes were saved in file between quotes. If file was saved in UTF-8 this will be equivalent to $s = "\xc3\x84", if file was saved in ISO-8859-1 (latin1) this will be equivalent to $s = "\xc4".

    Setting default_charset value does not affect bytes stored in strings in any way.

    What does default_charset do then?

    Some functions, that have to deal with strings as text and are encoding aware, accept $encoding as argument (usually optional). This tells the function what encoding the text is encoded in a string.

    Before PHP 5.6 default value of these optional $encoding arguments were either in function definition (e.g. htmlspecialchars()) or configurable in various php.ini settings for each extension separately (e.g. mbstring.internal_encoding, iconv.input_encoding).

    In PHP 5.6 new php.ini setting default_charset was introduced. Old settings were deprecated and all functions that accept optional $encoding argument should now default to default_charset value when encoding is not specified explicitly.

    However, developer is left responsible to make sure that text in string is actually encoded in encoding that was specified.


    Links:

    • Details of the String Type
      More details on nature of PHP strings (does not mention default_charset at the time of writing).
    • New features in PHP 5.6: Default character encoding
      Short introduction of new default_charset option in PHP 5.6 release notes.
    • Deprecated features in PHP 5.6: iconv and mbstring encoding settings
      List of deprecated php.ini options in favour of default_chaset option.
    0 讨论(0)
提交回复
热议问题