Does anybody know what is the best way to compare the contents of 2 csv files and report the identical rows.
By identical i mean, records which have the same values
You have file A and file B.
Parse file A and create objects for each row and store the content of one row in one object. While you are creating objects, store them in an array.
Do the same thing for file B.
So now you have two arrays, first array to store all the data of rows in file A, and another array for B.
Now you need to iterate through your first array, first each object in array A, scan array B and check if there is a same object in B. if all of the elements in array A passes this. it means they are idential. Otherwise, break.
There's a bit of issue with the code example by rlCH, namely
While it maybe enough for the op I was looking for a way to compare two multi-line csv files properly. (multi-line as in containing data spanning over multiple lines) So I've spent the time actually creating one, and I thought why not share it. Maybe it saves a bit of time for someone.
Now, I'm not using PHP from command line, so if you want to do that I suggest you change the input handling and the output (this one outputs html so you can use it in the browser)
Usage; put the script and the files to compare in a directory call the script with two parameters, f1 and f2 eg compareCSV.php?f1=file1.csv&f2=file2.csv
<?php
//---- init
$strFileName1=isset($_REQUEST['f1'])?$_REQUEST['f1']:'';
$strFileName2=isset($_REQUEST['f2'])?$_REQUEST['f2']:'';
if ( !$strFileName1 ) { die("I need the first file (f1)"); }
if ( !$strFileName2 ) { die("I need the second file (f2)"); }
try {
$arrFile1 = parseData($strFileName1);
$arrFile2 = parseData($strFileName2);
} catch (Exception $e) {
die($e->getMessage());
}
$rowCount1=count($arrFile1);
$rowCount2=count($arrFile2);
$colCount1=count($arrFile1[0]);
$colCount2=count($arrFile2[0]);
$highestRowCount = $rowCount1>$rowCount2 ? $rowCount1:$rowCount2;
$highestColCount = $colCount1>$colCount2 ? $colCount1:$colCount2;
$row = 0;
$err = 0;
//---- code
echo "<h2>comparing $strFileName1 and $strFileName2</h2>";
echo "\n<table border=1>";
echo "\n<tr><th>Err<th>Row#<th>Col#<th>Data in $strFileName1<th>Data in $strFileName2";
while($row<$highestRowCount) {
if(!isset($arrFile1[$row])) {
echo "\n<tr><td>Row missing in $strFileName1<th>$row";
$err++;
} elseif(!isset($arrFile1[$row])) {
echo "\n<tr><td>Row missing in $strFileName2<th>$row";
$err++;
} else {
$col=0;
while($col<$highestColCount) {
if ( !isset($arrFile1[$row][$col]) ) {
echo "\n<tr><td>Data missing in $strFileName1<td>$row<td>$col<td><td>".htmlentities($arrFile2[$row][$col]);
$err++;
} elseif ( !isset($arrFile2[$row][$col]) ) {
echo "\n<tr><td>Data missing in $strFileName1<td>$row<td>$col<td>".htmlentities($arrFile1[$row][$col]) ."<td>";
$err++;
} elseif ( $arrFile1[$row][$col] != $arrFile2[$row][$col] ) {
echo "\n<tr><td>Data mismatch";
echo "<td>$row <td>$col";
echo "<td>".htmlentities($arrFile1[$row][$col]);
echo "<td>".htmlentities($arrFile2[$row][$col]);
$err++;
}
$col++;
}
}
$row++;
}
echo "</table>";
if ( !$err ) {
echo "<br/>\n<br/>\nThe two csv data files seem identical<br/>\n";
} else {
echo "<br/>\n<br/>\nThere are $err differences";
}
//---- functions
function parseData($strFilename) {
$arrParsed = array();
$handle = fopen($strFilename , "r");
if ($handle) {
while (!feof($handle)) {
$data = fgetcsv($handle , 0 , ',' , '"' );
if ( empty($data)) continue; //empty row
$arrParsed[]=$data;
}
fclose($handle);
} else {
throw new Exception("File read error at $strFilename");
}
return $arrParsed;
}
?>
I think this is the actual code of which Lord Vader speaks:
#!/usr/bin/php
<?
$strFile1 = $argv[1];
$strFile2 = $argv[2];
function parseData($strFilename) {
$strAllData = file($strFilename);
foreach($strAllData as $intLineNum => $strLineData) {
$arrLineData = explode(',',$strLineData);
}
return $arrLineData;
}
$arrFile1 = parseData($strFile1);
$arrFile2 = parseData($strFile2);
$intRow = 0;
foreach($arrFile1 as $intKey => $strVal) {
if(!isset($arrFile2[$intKey]) || ($arrFile2[$intKey] != $strVal)) {
exit("Column $intKey, row $intRow of $strFile1 doesn't match\n");
}
$intRow++;
}
print "All rows match fine.\n";
?>