fwrite() and UTF8

后端 未结 8 834
隐瞒了意图╮
隐瞒了意图╮ 2020-12-05 16:02

I am creating a file using php fwrite() and I know all my data is in UTF8 ( I have done extensive testing on this - when saving data to db and outputting on normal webpage a

相关标签:
8条回答
  • 2020-12-05 16:30

    fwrite() is not binary safe. That means, that your data - be it correctly encoded or not - might get mangled by this command or it's underlying routines.

    To be on the safe side, you should use fopen() with the binary mode flag. that's b. Afterwards, fwrite() will safe your string data "as-is", and that is in PHP until now binary data, because strings in PHP are binary strings.

    Background: Some systems differ between text and binary data. The binary flag will explicitly command PHP on such systems to use the binary output. When you deal with UTF-8 you should take care that the data does not get's mangeled. That's prevented by handling the string data as binary data.

    However: If it's not like you told in your question that the UTF-8 encoding of the data is preserved, than your encoding got broken and even binary safe handling will keep the broken status. However, with the binary flag you still ensure that this is not the fwrite() part of your application that is breaking things.

    It has been rightfully written in another answer here, that you do not know the encoding if you have data only. However, you can validate data if it validates UTF-8 encoding or not, so giving you at least some chance to check the encoding. A function in PHP which does this I've posted in a UTF-8 releated question so it might be of use for you if you need to debug things: Answer to: SimpleXML and Chinese look for can_be_valid_utf8_statemachine, that's the name of the function.

    0 讨论(0)
  • 2020-12-05 16:31

    The problem is your data is double encoded. I assume your original text is something like:

    Don’t do anything

    with , i.e., not the straight apostrophe, but the right single quotation mark.

    If you write a PHP script with this content and encoded in UTF-8:

    <?php
    //File in UTF-8
    echo utf8_encode("Don’t"); //this will double encode
    

    You will get something similar to your output.

    0 讨论(0)
  • 2020-12-05 16:34
    //add BOM to fix UTF-8 in Excel
    fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));
    

    I find this piece works for me :)

    0 讨论(0)
  • 2020-12-05 16:34

    Try this simple method that is more useful and add to the top of the page before tag <body> :

    <head>
      <meta charset="utf-8">
    </head>
    
    0 讨论(0)
  • 2020-12-05 16:41
    $handle = fopen($file,"w");
    fwrite($handle, pack("CCC",0xef,0xbb,0xbf));
    fwrite($handle,$file); 
    fclose($handle);
    
    0 讨论(0)
  • 2020-12-05 16:49

    The only thing I had to do is add a UTF8 BOM to the CSV, the data was correct but the file reader (external application) couldn't read the file properly without the BOM

    0 讨论(0)
提交回复
热议问题