I\'m using excel 2010 professional plus to create an excel file. Later on I\'m trying to export it as a UTF-8 .csv file. I do this by saving it as CSV (symbol separated....
From what you say, I suspect excel writes an UTF-8 file without BOM, which makes guessing that the encoding is utf-8 slightly trickier. You can confirm this diagnostic if the characters appear correctly in Notepad++ when pressing to Format->Encode in UTF-8 (without BOM)
(rather than Format->Convert to UTF-8 (without BOM)
).
And are you sure every user is going to use UTF-8 ? Sounds to me that you need something that does a little smart guessing of what your real input encoding is. By "smart", I mean that this guessing recognizes BOM-less UTF-8.
To cut to the chase, I'd do something like that :
$f = fopen('file.csv', 'r');
while( ($row = fgets($f)) != null )
if( mb_detect_encoding($row, 'UTF-8', true) !== false )
var_dump(str_getcsv( $row, ';' ));
else
var_dump(str_getcsv( utf8_encode($row), ';' ));
fclose($f);
Which works because you read the characters to guess the encoding, rather than lazily trusting the first 3 characters : so UTF-8 without BOM would still be recognized as UTF-8. Of course if your csv file is not too big you could do that encoding detection on the whole file contents : something like mb_detect_encoding(file_get_contents(...), ...)