how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.
I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:
//The delimiters array to look through
$delimiters = array(
'semicolon' => ";",
'tab' => "\t",
'comma' => ",",
);
//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
$res[$key] = substr_count($csv, $delim);
}
//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);
reset($res);
$first_key = key($res);
return $delimiters[$first_key];
Thanks for all your inputs, I made mine using your tricks : preg_split, fgetcsv, loop, etc.
But I implemented something that was surprisingly not here, the use of fgets instead of reading the whole file, way better if the file is heavy!
Here's the code :
ini_set("auto_detect_line_endings", true);
function guessCsvDelimiter($filePath, $limitLines = 5) {
if (!is_readable($filePath) || !is_file($filePath)) {
return false;
}
$delimiters = array(
'tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$fp = fopen($filePath, 'r', false);
$lineResults = array(
'tab' => array(),
'comma' => array(),
'semicolon' => array()
);
$lineIndex = 0;
while (!feof($fp)) {
$line = fgets($fp);
foreach ($delimiters as $key=>$delimiter) {
$lineResults[$key][$lineIndex] = count (fgetcsv($fp, 1024, $delimiter)) - 1;
}
$lineIndex++;
if ($lineIndex > $limitLines) break;
}
fclose($fp);
// Calculating average
foreach ($lineResults as $key=>$entry) {
$lineResults[$key] = array_sum($entry)/count($entry);
}
arsort($lineResults);
reset($lineResults);
return ($lineResults[0] !== $lineResults[1]) ? $delimiters[key($lineResults)] : $delimiters['comma'];
}
Here's what I do.
This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).
/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$handle = file( $file ); # Grabs the CSV file, loads into array
$line = array(); # Stores the count of delimiters in each row
$valid_delimiter = array(); # Stores Valid Delimiters
# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
$line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}
# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){
# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
$match = true;
$prev_value = '';
foreach ( $count as $value ){
if ( $prev_value != '' )
$match = ( $prev_value == $value and $match == true ) ? true : false;
$prev_value = $value;
}
} else {
$match = false;
}
if ( $match == true ) $valid_delimiter[] = $delimiter;
}//foreach
# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
/* !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/
# The Delimiter for the CSV
echo $delimiters[$delimiter];