I\'m using excel 2010 professional plus to create an excel file. Later on I\'m trying to export it as a UTF-8 .csv file. I do this by saving it as CSV (symbol separated....
And for the people from Czech republic:
function convert( $str ) {
return iconv( "CP1250", "UTF-8", $str );
}
...
while (($data = fgetcsv($this->fhandle, 1000, ";")) !== FALSE) {
$data = array_map( "convert", $data );
...
The problem must be your file encoding, it looks it's not utf-8.
When I tried your example and double checked file that is indeed utf-8, it works for me, I get:
Array ( [0] => 1 [1] => Austria [2] => Österreich )
Use LibreOffice (OpenOffice), it's more reliable for these sort of things.
From what you say, I suspect excel writes an UTF-8 file without BOM, which makes guessing that the encoding is utf-8 slightly trickier. You can confirm this diagnostic if the characters appear correctly in Notepad++ when pressing to Format->Encode in UTF-8 (without BOM)
(rather than Format->Convert to UTF-8 (without BOM)
).
And are you sure every user is going to use UTF-8 ? Sounds to me that you need something that does a little smart guessing of what your real input encoding is. By "smart", I mean that this guessing recognizes BOM-less UTF-8.
To cut to the chase, I'd do something like that :
$f = fopen('file.csv', 'r');
while( ($row = fgets($f)) != null )
if( mb_detect_encoding($row, 'UTF-8', true) !== false )
var_dump(str_getcsv( $row, ';' ));
else
var_dump(str_getcsv( utf8_encode($row), ';' ));
fclose($f);
Which works because you read the characters to guess the encoding, rather than lazily trusting the first 3 characters : so UTF-8 without BOM would still be recognized as UTF-8. Of course if your csv file is not too big you could do that encoding detection on the whole file contents : something like mb_detect_encoding(file_get_contents(...), ...)
I don't know why Excel is generating a ANSI file instead of UTF-8 (as you can see in Notepad++), but if this is the case, you can convert the file using iconv:
iconv --from-code=ISO-8859-1 --to-code=UTF-8 my_csv_file.csv > my_csv_file_utf8.csv
From PHP DOC
Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.
You can try
header('Content-Type: text/html; charset=UTF-8');
$fp = fopen("log.txt", "r");
echo "<pre>";
while ( ($dataRow = fgetcsv($fp, 1000, ";")) !== FALSE ) {
$dataRow = array_map("utf8_encode", $dataRow);
print_r($dataRow);
}
Output
Array
(
[0] => ID
[1] => englishName
[2] => germanName
)
Array
(
[0] => 1
[1] => Austria
[2] => Österreich
)