fgetcsv() drops characters with diacritics (i.e. non-ASCII) - how to fix?

前端 未结 1 1551
执念已碎
执念已碎 2020-12-19 15:33

Similar questions:
Some characters in CSV file are not read during PHP fgetcsv() ,
fgetcsv() ignores special characters when they

1条回答
  •  有刺的猬
    2020-12-19 16:04

    It turns out that I didn't read the documentation well enough - fgetcsv() is only somewhat binary-safe. It is safe for plain ASCII < 127, but the documentation also says:

    Note:

    Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function

    In other words, fgetcsv() tries to be binary-safe, but it's actually not (because it's also messing with the charset at the same time), and it will probably mangle the data it reads (as this setting is not configured in php.ini, but rather read from $LANG).

    I've sidestepped the issue by reading the lines with fgets (which works on bytes, not characters) and using a CSV function from the comment in the docs to parse them into an array:

    $fhandle = fopen($uploaded_file,'r');
    while($raw_row = fgets($fhandle)) { // fgets is actually binary safe
        $row = csvstring_to_array($raw_row, ',', '"', "\n");
        // $row is now read correctly
    }
    

    0 讨论(0)
提交回复
热议问题