问题
There seems to be an encoding issue or bug in PHP with fputcsv() and fgetcsv().
The following PHP code:
$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];
print "\nBEFORE:\n";
var_export($row_before);
print "\n";
$fh = fopen($file = 'php://temp', 'rb+');
fputcsv($fh, $row_before);
rewind($fh);
$row_after = fgetcsv($fh);
print "\nAFTER:\n";
var_export($row_after);
print "\n\n";
fclose($fh);
Gives me this output:
BEFORE:
array (
0 => 'A',
1 => '["a","\\\\","b"]',
2 => 'B',
)
AFTER:
array (
0 => 'A',
1 => '["a","\\\\',
2 => 'b""]"',
3 => 'B',
)
So clearly, the data is damaged on the way. Originally there were just 3 cells in the row, afterwards there are 4 cells in the row. The middle cell is split thanks to the backslash that is also used as an escape character.
See also https://3v4l.org/nc1oE Or here, with explicit values for delimiter, enclosure, escape_char: https://3v4l.org/Svt7m
Is there any way I can sanitize / escape my data before writing to CSV, to guarantee that the data read from the file will be exactly the same?
Is CSV a fully reversible format?
EDIT: The goal would be a mechanism to properly write and read ANY data as csv, so that after one round trip the data is still the same.
EDIT: I realize that I do not really understand the $escape_char parameter. See also fgetcsv/fputcsv $escape parameter fundamentally broken Maybe an answer to this would also bring us closer to a solution.
回答1:
The culprit is that fputcsv() uses an escape character, which is a non-standard extension to CSV. (Well, as far as RFC 7111 can be regarded as standard.) Basically, this escape character would have to be disabled, but passing an empty string as $escape to fputcsv() doesn't work. Usually, passing a NUL character should give the desired results, however, see https://3v4l.org/MlluN.
回答2:
Using your code with specific delimiters but changing the following line will work...
$enclosure = "'";
I think it may be to do with thinking that the \ is escaping the following quote.
回答3:
As in php, \\
used to escape the backslash(link for PHP manual escape sequence),so for making it as string u need to use one more single quote(' ').
so your input array should be...
$row_before = ['A', json_encode(['a', "'\\'", 'b']), 'B'];
回答4:
This is not PHP bug. It seems that json_encode()
use the same delimiter (,), enclosure (") and escape (\) which is the same as default delimiter, enclosure and escape for both fputcsv()
and fgetcsv()
. You may differentiate enclosure or escape, and delimiter if necessary.
As already answered, in this case it will work by specify enclosure with (') instead:
$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];
print "\nBEFORE:\n";
var_export($row_before);
print "\n";
$fh = fopen($file = 'php://temp', 'rb+');
fputcsv($fh, $row_before, ',', "'");
rewind($fh);
$row_after = fgetcsv($fh, 0, ',', "'");
print "\nAFTER:\n";
var_export($row_after);
print "\n\n";
fclose($fh);
回答5:
Contrary to what others are saying, I claim that this is a PHP bug. I am going to report it, and update this answer.
EDIT: Now reported here, https://bugs.php.net/bug.php?id=74713
Discussed in this answer:
- Does changing the delimiter help? -> Not really.
- Could
fputcsv()
be fixed? -> Yes.
Does changing the delimiter help?
It can be shown that this is reproducible with any combination of delimiter, enclosure and escape character.
https://3v4l.org/a29kR
$delimiter = 'X';
$enclosure = 'Y';
$escape_char = "Z";
$row_before = [
'A',
"[{$enclosure}a{$enclosure}{$delimiter}{$enclosure}{$escape_char}{$escape_char}{$enclosure}{$delimiter}{$enclosure}b{$enclosure}]",
'B',
];
print "\nBEFORE:\n";
var_export($row_before);
print "\n";
$fh = fopen($file = 'php://temp', 'rb+');
fputcsv($fh,$row_before,$delimiter,$enclosure, $escape_char);
rewind($fh);
$row_plain = fread($fh, 1000);
print "\nPLAIN:\n";
var_export($row_plain);
print "\n";
rewind($fh);
$row_after = fgetcsv($fh, 500,$delimiter,$enclosure, $escape_char);
print "\nAFTER:\n";
var_export($row_after);
print "\n\n";
fclose($fh);
Output:
BEFORE:
array (
0 => 'A',
1 => '[YaYXYZZYXYbY]',
2 => 'B',
)
PLAIN:
'AXY[YYaYYXYYZZYXYYbYY]YXB
'
AFTER:
array (
0 => 'A',
1 => '[YaYXYZZ',
2 => 'bYY]Y',
3 => 'B',
)
Could fputcsv() be fixed?
For this let's turn back to more common and readable delimiter, enclosure and escape character.
$delimiter = ',';
$enclosure = '"';
$escape_char = "@";
Here the result is:
BEFORE:
array (
0 => 'A',
1 => '["a","@@","b"]',
2 => 'B',
)
PLAIN:
'A,"[""a"",""@@",""b""]",B
'
AFTER:
array (
0 => 'A',
1 => '["a","@@',
2 => 'b""]"',
3 => 'B',
)
We see that the '"@@"'
part is exported as '""@@"'
, while it SHOULD have been exported as '""@@""'
.
In fact, doing this manually with fwrite()
instead of fputcsv()
does fix the problem: https://3v4l.org/4U1CQ
来源:https://stackoverflow.com/questions/44427926/data-gets-garbled-when-writing-to-csv-with-fputcsv-fgetcsv