how to find out if csv file fields are tab delimited or comma delimited

后端 未结 15 945
[愿得一人]
[愿得一人] 2020-12-01 09:46

how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.

相关标签:
15条回答
  • 2020-12-01 10:32

    I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:

    //The delimiters array to look through
    $delimiters = array(
        'semicolon' => ";",
        'tab'       => "\t",
        'comma'     => ",",
    );
    
    //Load the csv file into a string
    $csv = file_get_contents($file);
    foreach ($delimiters as $key => $delim) {
        $res[$key] = substr_count($csv, $delim);
    }
    
    //reverse sort the values, so the [0] element has the most occured delimiter
    arsort($res);
    
    reset($res);
    $first_key = key($res);
    
    return $delimiters[$first_key]; 
    
    0 讨论(0)
  • 2020-12-01 10:35

    Thanks for all your inputs, I made mine using your tricks : preg_split, fgetcsv, loop, etc.

    But I implemented something that was surprisingly not here, the use of fgets instead of reading the whole file, way better if the file is heavy!

    Here's the code :

    ini_set("auto_detect_line_endings", true);
    function guessCsvDelimiter($filePath, $limitLines = 5) {
        if (!is_readable($filePath) || !is_file($filePath)) {
            return false;
        }
    
        $delimiters = array(
            'tab'       => "\t",
            'comma'     => ",",
            'semicolon' => ";"
        );
    
        $fp = fopen($filePath, 'r', false);
        $lineResults = array(
            'tab'       => array(),
            'comma'     => array(),
            'semicolon' => array()
        );
    
        $lineIndex = 0;
        while (!feof($fp)) {
            $line = fgets($fp);
    
            foreach ($delimiters as $key=>$delimiter) {
                $lineResults[$key][$lineIndex] = count (fgetcsv($fp, 1024, $delimiter)) - 1;
            }
    
            $lineIndex++;
            if ($lineIndex > $limitLines) break;
        }
        fclose($fp);
    
        // Calculating average
        foreach ($lineResults as $key=>$entry) {
            $lineResults[$key] = array_sum($entry)/count($entry);
        }
    
        arsort($lineResults);
        reset($lineResults);
        return ($lineResults[0] !== $lineResults[1]) ? $delimiters[key($lineResults)] : $delimiters['comma'];
    }
    
    0 讨论(0)
  • 2020-12-01 10:36

    Here's what I do.

    1. Parse the first 5 lines of a CSV file
    2. Count the number of delimiters [commas, tabs, semicolons and colons] in each line
    3. Compare the number of delimiters in each line. If you have a properly formatted CSV, then one of the delimiter counts will match in each row.

    This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).

    /* Rearrange this array to change the search priority of delimiters */
    $delimiters = array('tab'       => "\t",
                    'comma'     => ",",
                    'semicolon' => ";"
                    );
    
    $handle = file( $file );    # Grabs the CSV file, loads into array
    
    $line = array();            # Stores the count of delimiters in each row
    
    $valid_delimiter = array(); # Stores Valid Delimiters
    
    # Count the number of Delimiters in Each Row
    for ( $i = 1; $i < 6; $i++ ){
    foreach ( $delimiters as $key => $value ){
        $line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
    }
    }
    
    
    # Compare the Count of Delimiters in Each line
    foreach ( $line as $delimiter => $count ){
    
    # Check that the first two values are not 0
    if ( $count[1] > 0 and $count[2] > 0 ){
        $match = true;
    
        $prev_value = '';
        foreach ( $count as $value ){
    
            if ( $prev_value != '' )
                $match = ( $prev_value == $value and $match == true ) ? true : false;
    
            $prev_value = $value;
        }
    
    } else { 
        $match = false;
    }
    
    if ( $match == true )    $valid_delimiter[] = $delimiter;
    
    }//foreach
    
    # Set Default delimiter to comma
    $delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
    
    
    /*  !!!! This is good enough for my needs since I have the priority set to "tab"
    !!!! but you will want to have to user select from the delimiters in $valid_delimiter
    !!!! if multiple dilimiter counts match
    */
    
    # The Delimiter for the CSV
    echo $delimiters[$delimiter]; 
    
    0 讨论(0)
提交回复
热议问题